[PATCH v3 00/27] EEH Support for PowerNV platform

linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v3 00/27] EEH Support for PowerNV platform
@ 2013-06-05  7:34 Gavin Shan
  2013-06-05  7:34 ` [PATCH 01/27] powerpc/eeh: Fix fetching bus for single-dev-PE Gavin Shan
                   ` (27 more replies)
  0 siblings, 28 replies; 43+ messages in thread
From: Gavin Shan @ 2013-06-05  7:34 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Gavin Shan

Initially, the series of patches is built based on 3.10.RC1 and the patchset
doesn't intend to enable EEH functionality for PHB3 for now. Obviously, PHB3
EEH support on PowerNV platform is something to do in future.

The series of patches intends to support EEH for PowerNV platform. The EEH
core already supports multiple probe methods: device tree nodes and PCI
devices. For EEH on PowerNV, we're using PCI devices to do EEH probe, which
is different from the probe type used on pSeries platform. Another point I
should mention is that the overall EEH would be split up to 3 layers: EEH
core, platform layer and I/O chip layer. It would make the EEH on PowerNV
platform can achieve more flexibility and support more I/O chips in future.
Besides, the EEH event can be produced by detecting 0xFF's from reading
PCI config or I/O registers, or from interrupts dedicated for EEH error
reporting. So we have to handle the EEH error interrupts. On the other hand,
the EEH events will be processed by EEH core like pSeries platform does.

We will have exported debugfs entries ("/sys/kernel/debug/powerpc/PCIxxxx/err_injct"),
which allows you to control the 0xD10 register in order to force errors like
frozen PE and fenced PHB for testing purpose. The following example is usualy
what I'm using to control that register. The patchset has been verified on
Firebird-L machine where I have 2 Emulex ethernet card on PHB#0. I keep pinging
to one of the ethernet cards (eth0) from external and then use following commands
to produce frozen PE or fenced PHB errors. Eventually, the errors can be recovered
and the ethernet card is reachable after temporary connection lost.

Trigger frozen PE:

	echo 0x0000000002000000 > /sys/kernel/debug/powerpc/PCI0000/err_injct
	sleep 1
	echo 0x0 > /sys/kernel/debug/powerpc/PCI0000/err_injct

Trigger fenced PHB:

	echo 0x8000000000000000 > /sys/kernel/debug/powerpc/PCI0000/err_injct

Change log
==========

v2 -> v3:
	* Rebase to 3.10.RC4
	* Replace eeh_pci_dev_traverse() with pci_walk_bus()
	* Changlog adjustment to make that more clear
	* To call msleep() if possible after opal_pci_poll()
	* Make sure we have OPALv3
	* OPAL notifier so that we can register callback for the monitored events.
	  The OPAL notifier is disabled while restarting or powering off the system.
	* Make the debugfs entries something like (PCIxxxx/err_injct)
	* Split the patch so that can be backported to stable kernel
	* Allow to detect fenced PHB proactively (without interrupt)
	* Start to use opal_pci_get_phb_diag_data2()
	* Stack dump upon fenced PHB
v1 -> v2:
	* Rebase to 3.10.RC3
	* Don't fetch PE state for the case of fenced PHB. It usually takes long
	  time and possiblly incurs softlock warning. It requires the corresponding
	  changes for the underly firmware
	* Add debugfs entries so that we can inject errors like frozen PE and
	  fenced PHB for testing purpose

---

arch/powerpc/include/asm/eeh.h                 |   26 +-
arch/powerpc/include/asm/eeh_event.h           |    6 +-
arch/powerpc/include/asm/opal.h                |  138 +++++-
arch/powerpc/kernel/rtas_pci.c                 |    3 +-
arch/powerpc/platforms/powernv/Makefile        |    1 +
arch/powerpc/platforms/powernv/eeh-ioda.c      |  550 ++++++++++++++++++++++++
arch/powerpc/platforms/powernv/eeh-powernv.c   |  387 +++++++++++++++++
arch/powerpc/platforms/powernv/opal-wrappers.S |    3 +
arch/powerpc/platforms/powernv/opal.c          |   79 ++++-
arch/powerpc/platforms/powernv/pci-err.c       |  481 +++++++++++++++++++++
arch/powerpc/platforms/powernv/pci-ioda.c      |   38 ++-
arch/powerpc/platforms/powernv/pci-p5ioc2.c    |    6 +-
arch/powerpc/platforms/powernv/pci.c           |   44 ++-
arch/powerpc/platforms/powernv/pci.h           |   26 ++
arch/powerpc/platforms/powernv/setup.c         |    4 +
arch/powerpc/platforms/pseries/eeh.c           |  120 +++++-
arch/powerpc/platforms/pseries/eeh_event.c     |   12 +-
arch/powerpc/platforms/pseries/eeh_pe.c        |   31 ++-
18 files changed, 1907 insertions(+), 48 deletions(-)
create mode 100644 arch/powerpc/platforms/powernv/eeh-ioda.c
create mode 100644 arch/powerpc/platforms/powernv/eeh-powernv.c
create mode 100644 arch/powerpc/platforms/powernv/pci-err.c

Thanks,
Gavin

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH 01/27] powerpc/eeh: Fix fetching bus for single-dev-PE
  2013-06-05  7:34 [PATCH v3 00/27] EEH Support for PowerNV platform Gavin Shan
@ 2013-06-05  7:34 ` Gavin Shan
  2013-06-05  7:34 ` [PATCH 02/27] powerpc/eeh: Enhance converting EEH dev Gavin Shan
                   ` (26 subsequent siblings)
  27 siblings, 0 replies; 43+ messages in thread
From: Gavin Shan @ 2013-06-05  7:34 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Steve Best, Gavin Shan

While running Linux as guest on top of phyp, we possiblly have
PE that includes single PCI device. However, we didn't return
its PCI bus correctly and it leads to failure on recovery from
EEH errors for single-dev-PE. The patch fixes the issue.

Cc: <stable@vger.kernel.org> # v3.7+
Cc: Steve Best <sbest@us.ibm.com>
Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/pseries/eeh_pe.c |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/eeh_pe.c b/arch/powerpc/platforms/pseries/eeh_pe.c
index fe43d1a..9d4a9e8 100644
--- a/arch/powerpc/platforms/pseries/eeh_pe.c
+++ b/arch/powerpc/platforms/pseries/eeh_pe.c
@@ -639,7 +639,8 @@ struct pci_bus *eeh_pe_bus_get(struct eeh_pe *pe)
 
 	if (pe->type & EEH_PE_PHB) {
 		bus = pe->phb->bus;
-	} else if (pe->type & EEH_PE_BUS) {
+	} else if (pe->type & EEH_PE_BUS ||
+		   pe->type & EEH_PE_DEVICE) {
 		edev = list_first_entry(&pe->edevs, struct eeh_dev, list);
 		pdev = eeh_dev_to_pci_dev(edev);
 		if (pdev)
-- 
1.7.5.4

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 02/27] powerpc/eeh: Enhance converting EEH dev
  2013-06-05  7:34 [PATCH v3 00/27] EEH Support for PowerNV platform Gavin Shan
  2013-06-05  7:34 ` [PATCH 01/27] powerpc/eeh: Fix fetching bus for single-dev-PE Gavin Shan
@ 2013-06-05  7:34 ` Gavin Shan
  2013-06-05  7:34 ` [PATCH 03/27] powerpc/eeh: Make eeh_phb_pe_get() public Gavin Shan
                   ` (25 subsequent siblings)
  27 siblings, 0 replies; 43+ messages in thread
From: Gavin Shan @ 2013-06-05  7:34 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Gavin Shan

Under some special circumstances, the EEH device doesn't have the
associated device tree node or PCI device. The patch enhances those
functions converting EEH device to device tree node or PCI device
accordingly to avoid unnecessary system crash.

Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/eeh.h |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
index a80e32b4..e32c3c5 100644
--- a/arch/powerpc/include/asm/eeh.h
+++ b/arch/powerpc/include/asm/eeh.h
@@ -95,12 +95,12 @@ struct eeh_dev {
 
 static inline struct device_node *eeh_dev_to_of_node(struct eeh_dev *edev)
 {
-	return edev->dn;
+	return edev ? edev->dn : NULL;
 }
 
 static inline struct pci_dev *eeh_dev_to_pci_dev(struct eeh_dev *edev)
 {
-	return edev->pdev;
+	return edev ? edev->pdev : NULL;
 }
 
 /*
-- 
1.7.5.4

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 03/27] powerpc/eeh: Make eeh_phb_pe_get() public
  2013-06-05  7:34 [PATCH v3 00/27] EEH Support for PowerNV platform Gavin Shan
  2013-06-05  7:34 ` [PATCH 01/27] powerpc/eeh: Fix fetching bus for single-dev-PE Gavin Shan
  2013-06-05  7:34 ` [PATCH 02/27] powerpc/eeh: Enhance converting EEH dev Gavin Shan
@ 2013-06-05  7:34 ` Gavin Shan
  2013-06-05  7:34 ` [PATCH 04/27] powerpc/eeh: Make eeh_pe_get() public Gavin Shan
                   ` (24 subsequent siblings)
  27 siblings, 0 replies; 43+ messages in thread
From: Gavin Shan @ 2013-06-05  7:34 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Gavin Shan

One of the possible cases indicated by P7IOC interrupt is fenced
PHB. For that case, we need fetch the PE corresponding to the PHB
and disable the PHB and all subordinate PCI buses/devices, recover
from the fenced state and eventually enable the whole PHB. We need
one function to fetch the PHB PE outside eeh_pe.c and the patch is
going to make eeh_phb_pe_get() public for that purpose.

Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/eeh.h          |    1 +
 arch/powerpc/platforms/pseries/eeh_pe.c |    2 +-
 2 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
index e32c3c5..4ac6f70 100644
--- a/arch/powerpc/include/asm/eeh.h
+++ b/arch/powerpc/include/asm/eeh.h
@@ -184,6 +184,7 @@ static inline void eeh_unlock(void)
 
 typedef void *(*eeh_traverse_func)(void *data, void *flag);
 int eeh_phb_pe_create(struct pci_controller *phb);
+struct eeh_pe *eeh_phb_pe_get(struct pci_controller *phb);
 int eeh_add_to_parent_pe(struct eeh_dev *edev);
 int eeh_rmv_from_parent_pe(struct eeh_dev *edev, int purge_pe);
 void *eeh_pe_dev_traverse(struct eeh_pe *root,
diff --git a/arch/powerpc/platforms/pseries/eeh_pe.c b/arch/powerpc/platforms/pseries/eeh_pe.c
index 9d4a9e8..71c4544 100644
--- a/arch/powerpc/platforms/pseries/eeh_pe.c
+++ b/arch/powerpc/platforms/pseries/eeh_pe.c
@@ -95,7 +95,7 @@ int eeh_phb_pe_create(struct pci_controller *phb)
  * hierarchy tree is composed of PHB PEs. The function is used
  * to retrieve the corresponding PHB PE according to the given PHB.
  */
-static struct eeh_pe *eeh_phb_pe_get(struct pci_controller *phb)
+struct eeh_pe *eeh_phb_pe_get(struct pci_controller *phb)
 {
 	struct eeh_pe *pe;
 
-- 
1.7.5.4

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 04/27] powerpc/eeh: Make eeh_pe_get() public
  2013-06-05  7:34 [PATCH v3 00/27] EEH Support for PowerNV platform Gavin Shan
                   ` (2 preceding siblings ...)
  2013-06-05  7:34 ` [PATCH 03/27] powerpc/eeh: Make eeh_phb_pe_get() public Gavin Shan
@ 2013-06-05  7:34 ` Gavin Shan
  2013-06-05  7:34 ` [PATCH 05/27] powerpc/eeh: Trace PCI bus from PE Gavin Shan
                   ` (23 subsequent siblings)
  27 siblings, 0 replies; 43+ messages in thread
From: Gavin Shan @ 2013-06-05  7:34 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Gavin Shan

While processing EEH event interrupt from P7IOC, we need function
to retrieve the PE according to the indicated EEH device. The patch
makes function eeh_pe_get() public so that other source files can call
it for that purpose. Also, the patch fixes referring to wrong BDF
(Bus/Device/Function) address while searching PE in function
__eeh_pe_get().

Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/eeh.h          |    1 +
 arch/powerpc/platforms/pseries/eeh_pe.c |    4 ++--
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
index 4ac6f70..acdfcaa 100644
--- a/arch/powerpc/include/asm/eeh.h
+++ b/arch/powerpc/include/asm/eeh.h
@@ -185,6 +185,7 @@ static inline void eeh_unlock(void)
 typedef void *(*eeh_traverse_func)(void *data, void *flag);
 int eeh_phb_pe_create(struct pci_controller *phb);
 struct eeh_pe *eeh_phb_pe_get(struct pci_controller *phb);
+struct eeh_pe *eeh_pe_get(struct eeh_dev *edev);
 int eeh_add_to_parent_pe(struct eeh_dev *edev);
 int eeh_rmv_from_parent_pe(struct eeh_dev *edev, int purge_pe);
 void *eeh_pe_dev_traverse(struct eeh_pe *root,
diff --git a/arch/powerpc/platforms/pseries/eeh_pe.c b/arch/powerpc/platforms/pseries/eeh_pe.c
index 71c4544..3d2dcf5 100644
--- a/arch/powerpc/platforms/pseries/eeh_pe.c
+++ b/arch/powerpc/platforms/pseries/eeh_pe.c
@@ -228,7 +228,7 @@ static void *__eeh_pe_get(void *data, void *flag)
 		return pe;
 
 	/* Try BDF address */
-	if (edev->pe_config_addr &&
+	if (edev->config_addr &&
 	   (edev->config_addr == pe->config_addr))
 		return pe;
 
@@ -246,7 +246,7 @@ static void *__eeh_pe_get(void *data, void *flag)
  * which is composed of PCI bus/device/function number, or unified
  * PE address.
  */
-static struct eeh_pe *eeh_pe_get(struct eeh_dev *edev)
+struct eeh_pe *eeh_pe_get(struct eeh_dev *edev)
 {
 	struct eeh_pe *root = eeh_phb_pe_get(edev->phb);
 	struct eeh_pe *pe;
-- 
1.7.5.4

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 05/27] powerpc/eeh: Trace PCI bus from PE
  2013-06-05  7:34 [PATCH v3 00/27] EEH Support for PowerNV platform Gavin Shan
                   ` (3 preceding siblings ...)
  2013-06-05  7:34 ` [PATCH 04/27] powerpc/eeh: Make eeh_pe_get() public Gavin Shan
@ 2013-06-05  7:34 ` Gavin Shan
  2013-06-05  7:34 ` [PATCH 06/27] powerpc/eeh: Make eeh_init() public Gavin Shan
                   ` (22 subsequent siblings)
  27 siblings, 0 replies; 43+ messages in thread
From: Gavin Shan @ 2013-06-05  7:34 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Gavin Shan

There're several types of PEs can be supported for now: PHB, Bus
and Device dependent PE. For PCI bus dependent PE, tracing the
corresponding PCI bus from PE (struct eeh_pe) would make the code
more efficient. The patch also enables the retrieval of PCI bus based
on the PCI bus dependent PE.

Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/eeh.h          |    1 +
 arch/powerpc/platforms/pseries/eeh_pe.c |   22 ++++++++++++++++++++++
 2 files changed, 23 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
index acdfcaa..8ba1c39 100644
--- a/arch/powerpc/include/asm/eeh.h
+++ b/arch/powerpc/include/asm/eeh.h
@@ -62,6 +62,7 @@ struct eeh_pe {
 	int check_count;		/* Times of ignored error	*/
 	int freeze_count;		/* Times of froze up		*/
 	int false_positives;		/* Times of reported #ff's	*/
+	struct pci_bus *bus;		/* Top PCI bus for bus PE	*/
 	struct eeh_pe *parent;		/* Parent PE			*/
 	struct list_head child_list;	/* Link PE to the child list	*/
 	struct list_head edevs;		/* Link list of EEH devices	*/
diff --git a/arch/powerpc/platforms/pseries/eeh_pe.c b/arch/powerpc/platforms/pseries/eeh_pe.c
index 3d2dcf5..03f8223 100644
--- a/arch/powerpc/platforms/pseries/eeh_pe.c
+++ b/arch/powerpc/platforms/pseries/eeh_pe.c
@@ -304,6 +304,7 @@ static struct eeh_pe *eeh_pe_get_parent(struct eeh_dev *edev)
 int eeh_add_to_parent_pe(struct eeh_dev *edev)
 {
 	struct eeh_pe *pe, *parent;
+	struct eeh_dev *first_edev;
 
 	eeh_lock();
 
@@ -326,6 +327,21 @@ int eeh_add_to_parent_pe(struct eeh_dev *edev)
 		pe->type = EEH_PE_BUS;
 		edev->pe = pe;
 
+		/*
+		 * For PCI bus sensitive PE, we can reset the parent
+		 * bridge in order for hot-reset. However, the PCI
+		 * devices including the associated EEH devices might
+		 * be removed when EEH core is doing recovery. So that
+		 * won't safe to retrieve the bridge through downstream
+		 * EEH device. We have to trace the parent PCI bus, then
+		 * the parent bridge explicitly.
+		 */
+		if (eeh_probe_mode_dev() && !pe->bus) {
+			first_edev = list_first_entry(&pe->edevs,
+						struct eeh_dev, list);
+			pe->bus = eeh_dev_to_pci_dev(first_edev)->bus;
+		}
+
 		/* Put the edev to PE */
 		list_add_tail(&edev->list, &pe->edevs);
 		eeh_unlock();
@@ -641,12 +657,18 @@ struct pci_bus *eeh_pe_bus_get(struct eeh_pe *pe)
 		bus = pe->phb->bus;
 	} else if (pe->type & EEH_PE_BUS ||
 		   pe->type & EEH_PE_DEVICE) {
+		if (pe->bus) {
+			bus = pe->bus;
+			goto out;
+		}
+
 		edev = list_first_entry(&pe->edevs, struct eeh_dev, list);
 		pdev = eeh_dev_to_pci_dev(edev);
 		if (pdev)
 			bus = pdev->bus;
 	}
 
+out:
 	eeh_unlock();
 
 	return bus;
-- 
1.7.5.4

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 06/27] powerpc/eeh: Make eeh_init() public
  2013-06-05  7:34 [PATCH v3 00/27] EEH Support for PowerNV platform Gavin Shan
                   ` (4 preceding siblings ...)
  2013-06-05  7:34 ` [PATCH 05/27] powerpc/eeh: Trace PCI bus from PE Gavin Shan
@ 2013-06-05  7:34 ` Gavin Shan
  2013-06-05  7:34 ` [PATCH 07/27] powerpc/eeh: EEH post initialization operation Gavin Shan
                   ` (21 subsequent siblings)
  27 siblings, 0 replies; 43+ messages in thread
From: Gavin Shan @ 2013-06-05  7:34 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Gavin Shan

For EEH on PowerNV platform, we will do EEH probe based on the
real PCI devices. The PCI devices are available after PCI probe.
So we have to call eeh_init() explicitly on PowerNV platform
after PCI probe. The patch also does EEH probe for PowerNV platform
in eeh_init().

Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/eeh.h       |    8 +++++++-
 arch/powerpc/platforms/pseries/eeh.c |   22 ++++++++++++++++++++--
 2 files changed, 27 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
index 8ba1c39..72611fa 100644
--- a/arch/powerpc/include/asm/eeh.h
+++ b/arch/powerpc/include/asm/eeh.h
@@ -132,7 +132,7 @@ struct eeh_ops {
 	char *name;
 	int (*init)(void);
 	void* (*of_probe)(struct device_node *dn, void *flag);
-	void* (*dev_probe)(struct pci_dev *dev, void *flag);
+	int (*dev_probe)(struct pci_dev *dev, void *flag);
 	int (*set_option)(struct eeh_pe *pe, int option);
 	int (*get_pe_addr)(struct eeh_pe *pe);
 	int (*get_state)(struct eeh_pe *pe, int *state);
@@ -196,6 +196,7 @@ struct pci_bus *eeh_pe_bus_get(struct eeh_pe *pe);
 
 void *eeh_dev_init(struct device_node *dn, void *data);
 void eeh_dev_phb_init_dynamic(struct pci_controller *phb);
+int __init eeh_init(void);
 int __init eeh_ops_register(struct eeh_ops *ops);
 int __exit eeh_ops_unregister(const char *name);
 unsigned long eeh_check_failure(const volatile void __iomem *token,
@@ -224,6 +225,11 @@ void eeh_remove_bus_device(struct pci_dev *, int);
 
 #else /* !CONFIG_EEH */
 
+static inline int eeh_init(void)
+{
+	return 0;
+}
+
 static inline void *eeh_dev_init(struct device_node *dn, void *data)
 {
 	return NULL;
diff --git a/arch/powerpc/platforms/pseries/eeh.c b/arch/powerpc/platforms/pseries/eeh.c
index 6b73d6c..dbc9d83 100644
--- a/arch/powerpc/platforms/pseries/eeh.c
+++ b/arch/powerpc/platforms/pseries/eeh.c
@@ -674,11 +674,21 @@ int __exit eeh_ops_unregister(const char *name)
  * Even if force-off is set, the EEH hardware is still enabled, so that
  * newer systems can boot.
  */
-static int __init eeh_init(void)
+int __init eeh_init(void)
 {
 	struct pci_controller *hose, *tmp;
 	struct device_node *phb;
-	int ret;
+	static int cnt = 0;
+	int ret = 0;
+
+	/*
+	 * We have to delay the initialization on PowerNV after
+	 * the PCI hierarchy tree has been built because the PEs
+	 * are figured out based on PCI devices instead of device
+	 * tree nodes
+	 */
+	if (machine_is(powernv) && cnt++ <= 0)
+		return ret;
 
 	/* call platform initialization function */
 	if (!eeh_ops) {
@@ -700,6 +710,14 @@ static int __init eeh_init(void)
 			phb = hose->dn;
 			traverse_pci_devices(phb, eeh_ops->of_probe, NULL);
 		}
+	} else if (eeh_probe_mode_dev()) {
+		list_for_each_entry_safe(hose, tmp,
+			&hose_list, list_node)
+			pci_walk_bus(hose->bus, eeh_ops->dev_probe, NULL);
+	} else {
+		pr_warning("%s: Invalid probe mode %d\n",
+			__func__, eeh_probe_mode);
+		return -EINVAL;
 	}
 
 	if (eeh_subsystem_enabled)
-- 
1.7.5.4

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 07/27] powerpc/eeh: EEH post initialization operation
  2013-06-05  7:34 [PATCH v3 00/27] EEH Support for PowerNV platform Gavin Shan
                   ` (5 preceding siblings ...)
  2013-06-05  7:34 ` [PATCH 06/27] powerpc/eeh: Make eeh_init() public Gavin Shan
@ 2013-06-05  7:34 ` Gavin Shan
  2013-06-05  7:34 ` [PATCH 08/27] powerpc/eeh: Refactor eeh_reset_pe_once() Gavin Shan
                   ` (20 subsequent siblings)
  27 siblings, 0 replies; 43+ messages in thread
From: Gavin Shan @ 2013-06-05  7:34 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Gavin Shan

The patch adds new EEH operation post_init. It's used to notify
the platform that EEH core has completed the EEH probe. By that,
PowerNV platform starts to use the services supplied by EEH
functionality.

Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/eeh.h       |    1 +
 arch/powerpc/platforms/pseries/eeh.c |   11 +++++++++++
 2 files changed, 12 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
index 72611fa..a577d78 100644
--- a/arch/powerpc/include/asm/eeh.h
+++ b/arch/powerpc/include/asm/eeh.h
@@ -131,6 +131,7 @@ static inline struct pci_dev *eeh_dev_to_pci_dev(struct eeh_dev *edev)
 struct eeh_ops {
 	char *name;
 	int (*init)(void);
+	int (*post_init)(void);
 	void* (*of_probe)(struct device_node *dn, void *flag);
 	int (*dev_probe)(struct pci_dev *dev, void *flag);
 	int (*set_option)(struct eeh_pe *pe, int option);
diff --git a/arch/powerpc/platforms/pseries/eeh.c b/arch/powerpc/platforms/pseries/eeh.c
index dbc9d83..ef3b02a 100644
--- a/arch/powerpc/platforms/pseries/eeh.c
+++ b/arch/powerpc/platforms/pseries/eeh.c
@@ -720,6 +720,17 @@ int __init eeh_init(void)
 		return -EINVAL;
 	}
 
+	/*
+	 * Call platform post-initialization. Actually, It's good chance
+	 * to inform platform that EEH is ready to supply service if the
+	 * I/O cache stuff has been built up.
+	 */
+	if (eeh_ops->post_init) {
+		ret = eeh_ops->post_init();
+		if (ret)
+			return ret;
+	}
+
 	if (eeh_subsystem_enabled)
 		pr_info("EEH: PCI Enhanced I/O Error Handling Enabled\n");
 	else
-- 
1.7.5.4

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 08/27] powerpc/eeh: Refactor eeh_reset_pe_once()
  2013-06-05  7:34 [PATCH v3 00/27] EEH Support for PowerNV platform Gavin Shan
                   ` (6 preceding siblings ...)
  2013-06-05  7:34 ` [PATCH 07/27] powerpc/eeh: EEH post initialization operation Gavin Shan
@ 2013-06-05  7:34 ` Gavin Shan
  2013-06-05  7:34 ` [PATCH 09/27] powerpc/eeh: Delay EEH probe during hotplug Gavin Shan
                   ` (19 subsequent siblings)
  27 siblings, 0 replies; 43+ messages in thread
From: Gavin Shan @ 2013-06-05  7:34 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Gavin Shan

We shouldn't check that the returned PE status is exactly equal to
(EEH_STATE_MMIO_ACTIVE | EEH_STATE_DMA_ACTIVE) but instead only check
that they are both set.

[benh: changelog]
Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/pseries/eeh.c |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/eeh.c b/arch/powerpc/platforms/pseries/eeh.c
index ef3b02a..ba707104 100644
--- a/arch/powerpc/platforms/pseries/eeh.c
+++ b/arch/powerpc/platforms/pseries/eeh.c
@@ -565,6 +565,7 @@ static void eeh_reset_pe_once(struct eeh_pe *pe)
  */
 int eeh_reset_pe(struct eeh_pe *pe)
 {
+	int flags = (EEH_STATE_MMIO_ACTIVE | EEH_STATE_DMA_ACTIVE);
 	int i, rc;
 
 	/* Take three shots at resetting the bus */
@@ -572,7 +573,7 @@ int eeh_reset_pe(struct eeh_pe *pe)
 		eeh_reset_pe_once(pe);
 
 		rc = eeh_ops->wait_state(pe, PCI_BUS_RESET_WAIT_MSEC);
-		if (rc == (EEH_STATE_MMIO_ACTIVE | EEH_STATE_DMA_ACTIVE))
+		if ((rc & flags) == flags)
 			return 0;
 
 		if (rc < 0) {
-- 
1.7.5.4

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 09/27] powerpc/eeh: Delay EEH probe during hotplug
  2013-06-05  7:34 [PATCH v3 00/27] EEH Support for PowerNV platform Gavin Shan
                   ` (7 preceding siblings ...)
  2013-06-05  7:34 ` [PATCH 08/27] powerpc/eeh: Refactor eeh_reset_pe_once() Gavin Shan
@ 2013-06-05  7:34 ` Gavin Shan
  2013-06-05  7:34 ` [PATCH 10/27] powerpc/eeh: Differentiate EEH events Gavin Shan
                   ` (18 subsequent siblings)
  27 siblings, 0 replies; 43+ messages in thread
From: Gavin Shan @ 2013-06-05  7:34 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Gavin Shan

While doing EEH recovery, the PCI devices of the problematic PE
should be removed and then added to the system again. During the
so-called hotplug event, the PCI devices of the problematic PE
will be probed through early/late phase. We would delay EEH probe
on late point for PowerNV platform since the PCI device isn't
available in early phase.

Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/pseries/eeh.c |   16 +++++++++++++++-
 1 files changed, 15 insertions(+), 1 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/eeh.c b/arch/powerpc/platforms/pseries/eeh.c
index ba707104..17d86d7 100644
--- a/arch/powerpc/platforms/pseries/eeh.c
+++ b/arch/powerpc/platforms/pseries/eeh.c
@@ -758,6 +758,14 @@ static void eeh_add_device_early(struct device_node *dn)
 {
 	struct pci_controller *phb;
 
+	/*
+	 * If we're doing EEH probe based on PCI device, we
+	 * would delay the probe until late stage because
+	 * the PCI device isn't available this moment.
+	 */
+	if (!eeh_probe_mode_devtree())
+		return;
+
 	if (!of_node_to_eeh_dev(dn))
 		return;
 	phb = of_node_to_eeh_dev(dn)->phb;
@@ -766,7 +774,6 @@ static void eeh_add_device_early(struct device_node *dn)
 	if (NULL == phb || 0 == phb->buid)
 		return;
 
-	/* FIXME: hotplug support on POWERNV */
 	eeh_ops->of_probe(dn, NULL);
 }
 
@@ -817,6 +824,13 @@ static void eeh_add_device_late(struct pci_dev *dev)
 	edev->pdev = dev;
 	dev->dev.archdata.edev = edev;
 
+	/*
+	 * We have to do the EEH probe here because the PCI device
+	 * hasn't been created yet in the early stage.
+	 */
+	if (eeh_probe_mode_dev())
+		eeh_ops->dev_probe(dev, NULL);
+
 	eeh_addr_cache_insert_dev(dev);
 }
 
-- 
1.7.5.4

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 10/27] powerpc/eeh: Differentiate EEH events
  2013-06-05  7:34 [PATCH v3 00/27] EEH Support for PowerNV platform Gavin Shan
                   ` (8 preceding siblings ...)
  2013-06-05  7:34 ` [PATCH 09/27] powerpc/eeh: Delay EEH probe during hotplug Gavin Shan
@ 2013-06-05  7:34 ` Gavin Shan
  2013-06-05  7:34 ` [PATCH 11/27] powerpc/eeh: Sync OPAL API with firmware Gavin Shan
                   ` (17 subsequent siblings)
  27 siblings, 0 replies; 43+ messages in thread
From: Gavin Shan @ 2013-06-05  7:34 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Gavin Shan

The EEH event is usually produced because of 0xFF's returned from
PCI config or I/O registers. PowerNV platform also can produce EEH
event through interrupts. The patch differentiates the EEH events
produced for different cases in order to process them differently
in future.

Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/eeh.h             |    4 ++--
 arch/powerpc/include/asm/eeh_event.h       |    6 +++++-
 arch/powerpc/kernel/rtas_pci.c             |    3 ++-
 arch/powerpc/platforms/pseries/eeh.c       |    7 ++++---
 arch/powerpc/platforms/pseries/eeh_event.c |    4 +++-
 5 files changed, 16 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
index a577d78..d1fd5d4 100644
--- a/arch/powerpc/include/asm/eeh.h
+++ b/arch/powerpc/include/asm/eeh.h
@@ -202,7 +202,7 @@ int __init eeh_ops_register(struct eeh_ops *ops);
 int __exit eeh_ops_unregister(const char *name);
 unsigned long eeh_check_failure(const volatile void __iomem *token,
 				unsigned long val);
-int eeh_dev_check_failure(struct eeh_dev *edev);
+int eeh_dev_check_failure(struct eeh_dev *edev, int flag);
 void __init eeh_addr_cache_build(void);
 void eeh_add_device_tree_early(struct device_node *);
 void eeh_add_device_tree_late(struct pci_bus *);
@@ -243,7 +243,7 @@ static inline unsigned long eeh_check_failure(const volatile void __iomem *token
 	return val;
 }
 
-#define eeh_dev_check_failure(x) (0)
+#define eeh_dev_check_failure(x, f) (0)
 
 static inline void eeh_addr_cache_build(void) { }
 
diff --git a/arch/powerpc/include/asm/eeh_event.h b/arch/powerpc/include/asm/eeh_event.h
index de67d83..7e00f23 100644
--- a/arch/powerpc/include/asm/eeh_event.h
+++ b/arch/powerpc/include/asm/eeh_event.h
@@ -26,12 +26,16 @@
  * to this struct is passed as the data pointer in a notify
  * callback.
  */
+#define EEH_EVENT_NORMAL	(1 << 0)
+#define EEH_EVENT_INT		(1 << 1)
+
 struct eeh_event {
+	int			flag;	/* Event flag		*/
 	struct list_head	list;	/* to form event queue	*/
 	struct eeh_pe		*pe;	/* EEH PE		*/
 };
 
-int eeh_send_failure_event(struct eeh_pe *pe);
+int eeh_send_failure_event(struct eeh_pe *pe, int flag);
 void eeh_handle_event(struct eeh_pe *pe);
 
 #endif /* __KERNEL__ */
diff --git a/arch/powerpc/kernel/rtas_pci.c b/arch/powerpc/kernel/rtas_pci.c
index 6e7b7cd..8d26f92 100644
--- a/arch/powerpc/kernel/rtas_pci.c
+++ b/arch/powerpc/kernel/rtas_pci.c
@@ -39,6 +39,7 @@
 #include <asm/mpic.h>
 #include <asm/ppc-pci.h>
 #include <asm/eeh.h>
+#include <asm/eeh_event.h>
 
 /* RTAS tokens */
 static int read_pci_config;
@@ -81,7 +82,7 @@ int rtas_read_config(struct pci_dn *pdn, int where, int size, u32 *val)
 		return PCIBIOS_DEVICE_NOT_FOUND;
 
 	if (returnval == EEH_IO_ERROR_VALUE(size) &&
-	    eeh_dev_check_failure(of_node_to_eeh_dev(pdn->node)))
+	    eeh_dev_check_failure(of_node_to_eeh_dev(pdn->node), EEH_EVENT_NORMAL))
 		return PCIBIOS_DEVICE_NOT_FOUND;
 
 	return PCIBIOS_SUCCESSFUL;
diff --git a/arch/powerpc/platforms/pseries/eeh.c b/arch/powerpc/platforms/pseries/eeh.c
index 17d86d7..a42b410 100644
--- a/arch/powerpc/platforms/pseries/eeh.c
+++ b/arch/powerpc/platforms/pseries/eeh.c
@@ -272,6 +272,7 @@ static inline unsigned long eeh_token_to_phys(unsigned long token)
 /**
  * eeh_dev_check_failure - Check if all 1's data is due to EEH slot freeze
  * @edev: eeh device
+ * @flag: EEH event flag
  *
  * Check for an EEH failure for the given device node.  Call this
  * routine if the result of a read was all 0xff's and you want to
@@ -283,7 +284,7 @@ static inline unsigned long eeh_token_to_phys(unsigned long token)
  *
  * It is safe to call this routine in an interrupt context.
  */
-int eeh_dev_check_failure(struct eeh_dev *edev)
+int eeh_dev_check_failure(struct eeh_dev *edev, int flag)
 {
 	int ret;
 	unsigned long flags;
@@ -376,7 +377,7 @@ int eeh_dev_check_failure(struct eeh_dev *edev)
 	eeh_pe_state_mark(pe, EEH_PE_ISOLATED);
 	raw_spin_unlock_irqrestore(&confirm_error_lock, flags);
 
-	eeh_send_failure_event(pe);
+	eeh_send_failure_event(pe, flag);
 
 	/* Most EEH events are due to device driver bugs.  Having
 	 * a stack trace will help the device-driver authors figure
@@ -417,7 +418,7 @@ unsigned long eeh_check_failure(const volatile void __iomem *token, unsigned lon
 		return val;
 	}
 
-	eeh_dev_check_failure(edev);
+	eeh_dev_check_failure(edev, EEH_EVENT_NORMAL);
 
 	pci_dev_put(eeh_dev_to_pci_dev(edev));
 	return val;
diff --git a/arch/powerpc/platforms/pseries/eeh_event.c b/arch/powerpc/platforms/pseries/eeh_event.c
index 185bedd..1f86b80 100644
--- a/arch/powerpc/platforms/pseries/eeh_event.c
+++ b/arch/powerpc/platforms/pseries/eeh_event.c
@@ -114,12 +114,13 @@ static void eeh_thread_launcher(struct work_struct *dummy)
 /**
  * eeh_send_failure_event - Generate a PCI error event
  * @pe: EEH PE
+ * @flag: EEH event flag
  *
  * This routine can be called within an interrupt context;
  * the actual event will be delivered in a normal context
  * (from a workqueue).
  */
-int eeh_send_failure_event(struct eeh_pe *pe)
+int eeh_send_failure_event(struct eeh_pe *pe, int flag)
 {
 	unsigned long flags;
 	struct eeh_event *event;
@@ -129,6 +130,7 @@ int eeh_send_failure_event(struct eeh_pe *pe)
 		pr_err("EEH: out of memory, event not handled\n");
 		return -ENOMEM;
 	}
+	event->flag = flag;
 	event->pe = pe;
 
 	/* We may or may not be called in an interrupt context */
-- 
1.7.5.4

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 11/27] powerpc/eeh: Sync OPAL API with firmware
  2013-06-05  7:34 [PATCH v3 00/27] EEH Support for PowerNV platform Gavin Shan
                   ` (9 preceding siblings ...)
  2013-06-05  7:34 ` [PATCH 10/27] powerpc/eeh: Differentiate EEH events Gavin Shan
@ 2013-06-05  7:34 ` Gavin Shan
  2013-06-05  7:34 ` [PATCH 12/27] powerpc/eeh: EEH backend for P7IOC Gavin Shan
                   ` (16 subsequent siblings)
  27 siblings, 0 replies; 43+ messages in thread
From: Gavin Shan @ 2013-06-05  7:34 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Gavin Shan

The patch synchronizes OPAL APIs between kernel and firmware. Also,
we starts to replace opal_pci_get_phb_diag_data() with the similar
opal_pci_get_phb_diag_data2() and the former OPAL API would return
OPAL_UNSUPPORTED from now on.

Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/opal.h                |  135 ++++++++++++++++++++----
 arch/powerpc/platforms/powernv/opal-wrappers.S |    3 +
 arch/powerpc/platforms/powernv/pci.c           |    3 +-
 3 files changed, 119 insertions(+), 22 deletions(-)

diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
index cbb9305..2880797 100644
--- a/arch/powerpc/include/asm/opal.h
+++ b/arch/powerpc/include/asm/opal.h
@@ -117,7 +117,13 @@ extern int opal_enter_rtas(struct rtas_args *args,
 #define OPAL_SET_SLOT_LED_STATUS		55
 #define OPAL_GET_EPOW_STATUS			56
 #define OPAL_SET_SYSTEM_ATTENTION_LED		57
+#define OPAL_RESERVED1				58
+#define OPAL_RESERVED2				59
+#define OPAL_PCI_NEXT_ERROR			60
+#define OPAL_PCI_EEH_FREEZE_STATUS2		61
+#define OPAL_PCI_POLL				62
 #define OPAL_PCI_MSI_EOI			63
+#define OPAL_PCI_GET_PHB_DIAG_DATA2		64
 
 #ifndef __ASSEMBLY__
 
@@ -125,6 +131,7 @@ extern int opal_enter_rtas(struct rtas_args *args,
 enum OpalVendorApiTokens {
 	OPAL_START_VENDOR_API_RANGE = 1000, OPAL_END_VENDOR_API_RANGE = 1999
 };
+
 enum OpalFreezeState {
 	OPAL_EEH_STOPPED_NOT_FROZEN = 0,
 	OPAL_EEH_STOPPED_MMIO_FREEZE = 1,
@@ -134,55 +141,69 @@ enum OpalFreezeState {
 	OPAL_EEH_STOPPED_TEMP_UNAVAIL = 5,
 	OPAL_EEH_STOPPED_PERM_UNAVAIL = 6
 };
+
 enum OpalEehFreezeActionToken {
 	OPAL_EEH_ACTION_CLEAR_FREEZE_MMIO = 1,
 	OPAL_EEH_ACTION_CLEAR_FREEZE_DMA = 2,
 	OPAL_EEH_ACTION_CLEAR_FREEZE_ALL = 3
 };
+
 enum OpalPciStatusToken {
-	OPAL_EEH_PHB_NO_ERROR = 0,
-	OPAL_EEH_PHB_FATAL = 1,
-	OPAL_EEH_PHB_RECOVERABLE = 2,
-	OPAL_EEH_PHB_BUS_ERROR = 3,
-	OPAL_EEH_PCI_NO_DEVSEL = 4,
-	OPAL_EEH_PCI_TA = 5,
-	OPAL_EEH_PCIEX_UR = 6,
-	OPAL_EEH_PCIEX_CA = 7,
-	OPAL_EEH_PCI_MMIO_ERROR = 8,
-	OPAL_EEH_PCI_DMA_ERROR = 9
+	OPAL_EEH_NO_ERROR	= 0,
+	OPAL_EEH_IOC_ERROR	= 1,
+	OPAL_EEH_PHB_ERROR	= 2,
+	OPAL_EEH_PE_ERROR	= 3,
+	OPAL_EEH_PE_MMIO_ERROR	= 4,
+	OPAL_EEH_PE_DMA_ERROR	= 5
 };
+
+enum OpalPciErrorSeverity {
+	OPAL_EEH_SEV_NO_ERROR	= 0,
+	OPAL_EEH_SEV_IOC_DEAD	= 1,
+	OPAL_EEH_SEV_PHB_DEAD	= 2,
+	OPAL_EEH_SEV_PHB_FENCED	= 3,
+	OPAL_EEH_SEV_PE_ER	= 4,
+	OPAL_EEH_SEV_INF	= 5
+};
+
 enum OpalShpcAction {
 	OPAL_SHPC_GET_LINK_STATE = 0,
 	OPAL_SHPC_GET_SLOT_STATE = 1
 };
+
 enum OpalShpcLinkState {
 	OPAL_SHPC_LINK_DOWN = 0,
 	OPAL_SHPC_LINK_UP = 1
 };
+
 enum OpalMmioWindowType {
 	OPAL_M32_WINDOW_TYPE = 1,
 	OPAL_M64_WINDOW_TYPE = 2,
 	OPAL_IO_WINDOW_TYPE = 3
 };
+
 enum OpalShpcSlotState {
 	OPAL_SHPC_DEV_NOT_PRESENT = 0,
 	OPAL_SHPC_DEV_PRESENT = 1
 };
+
 enum OpalExceptionHandler {
 	OPAL_MACHINE_CHECK_HANDLER = 1,
 	OPAL_HYPERVISOR_MAINTENANCE_HANDLER = 2,
 	OPAL_SOFTPATCH_HANDLER = 3
 };
+
 enum OpalPendingState {
-	OPAL_EVENT_OPAL_INTERNAL = 0x1,
-	OPAL_EVENT_NVRAM = 0x2,
-	OPAL_EVENT_RTC = 0x4,
-	OPAL_EVENT_CONSOLE_OUTPUT = 0x8,
-	OPAL_EVENT_CONSOLE_INPUT = 0x10,
-	OPAL_EVENT_ERROR_LOG_AVAIL = 0x20,
-	OPAL_EVENT_ERROR_LOG = 0x40,
-	OPAL_EVENT_EPOW = 0x80,
-	OPAL_EVENT_LED_STATUS = 0x100
+	OPAL_EVENT_OPAL_INTERNAL	= 0x1,
+	OPAL_EVENT_NVRAM		= 0x2,
+	OPAL_EVENT_RTC			= 0x4,
+	OPAL_EVENT_CONSOLE_OUTPUT	= 0x8,
+	OPAL_EVENT_CONSOLE_INPUT	= 0x10,
+	OPAL_EVENT_ERROR_LOG_AVAIL	= 0x20,
+	OPAL_EVENT_ERROR_LOG		= 0x40,
+	OPAL_EVENT_EPOW			= 0x80,
+	OPAL_EVENT_LED_STATUS		= 0x100,
+	OPAL_EVENT_PCI_ERROR		= 0x200
 };
 
 /* Machine check related definitions */
@@ -364,15 +385,80 @@ struct opal_machine_check_event {
 	} u;
 };
 
+enum {
+	OPAL_P7IOC_DIAG_TYPE_NONE	= 0,
+	OPAL_P7IOC_DIAG_TYPE_RGC	= 1,
+	OPAL_P7IOC_DIAG_TYPE_BI		= 2,
+	OPAL_P7IOC_DIAG_TYPE_CI		= 3,
+	OPAL_P7IOC_DIAG_TYPE_MISC	= 4,
+	OPAL_P7IOC_DIAG_TYPE_I2C	= 5,
+	OPAL_P7IOC_DIAG_TYPE_LAST	= 6
+};
+
+struct OpalIoP7IOCErrorData {
+	uint16_t type;
+
+	/* GEM */
+	uint64_t gemXfir;
+	uint64_t gemRfir;
+	uint64_t gemRirqfir;
+	uint64_t gemMask;
+	uint64_t gemRwof;
+
+	/* LEM */
+	uint64_t lemFir;
+	uint64_t lemErrMask;
+	uint64_t lemAction0;
+	uint64_t lemAction1;
+	uint64_t lemWof;
+
+	union {
+		struct OpalIoP7IOCRgcErrorData {
+			uint64_t rgcStatus;		/* 3E1C10 */
+			uint64_t rgcLdcp;		/* 3E1C18 */
+		}rgc;
+		struct OpalIoP7IOCBiErrorData {
+			uint64_t biLdcp0;		/* 3C0100, 3C0118 */
+			uint64_t biLdcp1;		/* 3C0108, 3C0120 */
+			uint64_t biLdcp2;		/* 3C0110, 3C0128 */
+			uint64_t biFenceStatus;		/* 3C0130, 3C0130 */
+
+			uint8_t  biDownbound;		/* BI Downbound or Upbound */
+		}bi;
+		struct OpalIoP7IOCCiErrorData {
+			uint64_t ciPortStatus;		/* 3Dn008 */
+			uint64_t ciPortLdcp;		/* 3Dn010 */
+
+			uint8_t	 ciPort;		/* Index of CI port: 0/1 */
+		}ci;
+	};
+};
+
 /**
  * This structure defines the overlay which will be used to store PHB error
  * data upon request.
  */
 enum {
+	OPAL_PHB_ERROR_DATA_VERSION_1 = 1,
+};
+
+enum {
+	OPAL_PHB_ERROR_DATA_TYPE_P7IOC = 1,
+};
+
+enum {
 	OPAL_P7IOC_NUM_PEST_REGS = 128,
 };
 
+struct OpalIoPhbErrorCommon {
+	uint32_t version;
+	uint32_t ioType;
+	uint32_t len;
+};
+
 struct OpalIoP7IOCPhbErrorData {
+	struct OpalIoPhbErrorCommon common;
+
 	uint32_t brdgCtl;
 
 	// P7IOC utl regs
@@ -530,14 +616,21 @@ int64_t opal_pci_map_pe_dma_window_real(uint64_t phb_id, uint16_t pe_number,
 					uint64_t pci_mem_size);
 int64_t opal_pci_reset(uint64_t phb_id, uint8_t reset_scope, uint8_t assert_state);
 
-int64_t opal_pci_get_hub_diag_data(uint64_t hub_id, void *diag_buffer, uint64_t diag_buffer_len);
-int64_t opal_pci_get_phb_diag_data(uint64_t phb_id, void *diag_buffer, uint64_t diag_buffer_len);
+int64_t opal_pci_get_hub_diag_data(uint64_t hub_id, void *diag_buffer,
+				   uint64_t diag_buffer_len);
+int64_t opal_pci_get_phb_diag_data(uint64_t phb_id, void *diag_buffer,
+				   uint64_t diag_buffer_len);
+int64_t opal_pci_get_phb_diag_data2(uint64_t phb_id, void *diag_buffer,
+				    uint64_t diag_buffer_len);
 int64_t opal_pci_fence_phb(uint64_t phb_id);
 int64_t opal_pci_reinit(uint64_t phb_id, uint8_t reinit_scope);
 int64_t opal_pci_mask_pe_error(uint64_t phb_id, uint16_t pe_number, uint8_t error_type, uint8_t mask_action);
 int64_t opal_set_slot_led_status(uint64_t phb_id, uint64_t slot_id, uint8_t led_type, uint8_t led_action);
 int64_t opal_get_epow_status(uint64_t *status);
 int64_t opal_set_system_attention_led(uint8_t led_action);
+int64_t opal_pci_next_error(uint64_t phb_id, uint64_t *first_frozen_pe,
+			    uint16_t *pci_error_type, uint16_t *severity);
+int64_t opal_pci_poll(uint64_t phb_id);
 
 /* Internal functions */
 extern int early_init_dt_scan_opal(unsigned long node, const char *uname, int depth, void *data);
diff --git a/arch/powerpc/platforms/powernv/opal-wrappers.S b/arch/powerpc/platforms/powernv/opal-wrappers.S
index 6fabe92..e88863f 100644
--- a/arch/powerpc/platforms/powernv/opal-wrappers.S
+++ b/arch/powerpc/platforms/powernv/opal-wrappers.S
@@ -107,4 +107,7 @@ OPAL_CALL(opal_pci_mask_pe_error,		OPAL_PCI_MASK_PE_ERROR);
 OPAL_CALL(opal_set_slot_led_status,		OPAL_SET_SLOT_LED_STATUS);
 OPAL_CALL(opal_get_epow_status,			OPAL_GET_EPOW_STATUS);
 OPAL_CALL(opal_set_system_attention_led,	OPAL_SET_SYSTEM_ATTENTION_LED);
+OPAL_CALL(opal_pci_next_error,			OPAL_PCI_NEXT_ERROR);
+OPAL_CALL(opal_pci_poll,			OPAL_PCI_POLL);
 OPAL_CALL(opal_pci_msi_eoi,			OPAL_PCI_MSI_EOI);
+OPAL_CALL(opal_pci_get_phb_diag_data2,		OPAL_PCI_GET_PHB_DIAG_DATA2);
diff --git a/arch/powerpc/platforms/powernv/pci.c b/arch/powerpc/platforms/powernv/pci.c
index 277343c..20af220 100644
--- a/arch/powerpc/platforms/powernv/pci.c
+++ b/arch/powerpc/platforms/powernv/pci.c
@@ -202,7 +202,8 @@ static void pnv_pci_handle_eeh_config(struct pnv_phb *phb, u32 pe_no)
 
 	spin_lock_irqsave(&phb->lock, flags);
 
-	rc = opal_pci_get_phb_diag_data(phb->opal_id, phb->diag.blob, PNV_PCI_DIAG_BUF_SIZE);
+	rc = opal_pci_get_phb_diag_data2(phb->opal_id, phb->diag.blob,
+					 PNV_PCI_DIAG_BUF_SIZE);
 	has_diag = (rc == OPAL_SUCCESS);
 
 	rc = opal_pci_eeh_freeze_clear(phb->opal_id, pe_no,
-- 
1.7.5.4

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 12/27] powerpc/eeh: EEH backend for P7IOC
  2013-06-05  7:34 [PATCH v3 00/27] EEH Support for PowerNV platform Gavin Shan
                   ` (10 preceding siblings ...)
  2013-06-05  7:34 ` [PATCH 11/27] powerpc/eeh: Sync OPAL API with firmware Gavin Shan
@ 2013-06-05  7:34 ` Gavin Shan
  2013-06-05  7:34 ` [PATCH 13/27] powerpc/eeh: I/O chip post initialization Gavin Shan
                   ` (15 subsequent siblings)
  27 siblings, 0 replies; 43+ messages in thread
From: Gavin Shan @ 2013-06-05  7:34 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Gavin Shan

For EEH on PowerNV platform, the overall architecture is a bit
different from that on pSeries platform. In order to support multiple
I/O chips in future, we split EEH to 3 layers for PowerNV platform:
EEH core, platform layer, I/O layer. It would give EEH implementation
on PowerNV much more flexibility in future.

The patch adds the EEH backend for P7IOC.

Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/powernv/Makefile   |    1 +
 arch/powerpc/platforms/powernv/eeh-ioda.c |   53 +++++++++++++++++++++++++++++
 arch/powerpc/platforms/powernv/pci.h      |   22 ++++++++++++
 3 files changed, 76 insertions(+), 0 deletions(-)
 create mode 100644 arch/powerpc/platforms/powernv/eeh-ioda.c

diff --git a/arch/powerpc/platforms/powernv/Makefile b/arch/powerpc/platforms/powernv/Makefile
index bcc3cb4..09bd0cb 100644
--- a/arch/powerpc/platforms/powernv/Makefile
+++ b/arch/powerpc/platforms/powernv/Makefile
@@ -3,3 +3,4 @@ obj-y			+= opal-rtc.o opal-nvram.o
 
 obj-$(CONFIG_SMP)	+= smp.o
 obj-$(CONFIG_PCI)	+= pci.o pci-p5ioc2.o pci-ioda.o
+obj-$(CONFIG_EEH)	+= eeh-ioda.o
diff --git a/arch/powerpc/platforms/powernv/eeh-ioda.c b/arch/powerpc/platforms/powernv/eeh-ioda.c
new file mode 100644
index 0000000..b344576
--- /dev/null
+++ b/arch/powerpc/platforms/powernv/eeh-ioda.c
@@ -0,0 +1,53 @@
+/*
+ * The file intends to implement the functions needed by EEH, which is
+ * built on IODA compliant chip. Actually, lots of functions related
+ * to EEH would be built based on the OPAL APIs.
+ *
+ * Copyright Benjamin Herrenschmidt & Gavin Shan, IBM Corporation 2013.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307 USA
+ */
+
+#include <linux/bootmem.h>
+#include <linux/delay.h>
+#include <linux/init.h>
+#include <linux/io.h>
+#include <linux/irq.h>
+#include <linux/kernel.h>
+#include <linux/msi.h>
+#include <linux/pci.h>
+#include <linux/string.h>
+
+#include <asm/eeh.h>
+#include <asm/eeh_event.h>
+#include <asm/io.h>
+#include <asm/iommu.h>
+#include <asm/msi_bitmap.h>
+#include <asm/opal.h>
+#include <asm/pci-bridge.h>
+#include <asm/ppc-pci.h>
+#include <asm/tce.h>
+
+#include "powernv.h"
+#include "pci.h"
+
+struct pnv_eeh_ops ioda_eeh_ops = {
+	.post_init		= NULL,
+	.set_option		= NULL,
+	.get_state		= NULL,
+	.reset			= NULL,
+	.get_log		= NULL,
+	.configure_bridge	= NULL
+};
diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
index 25d76c4..1770188 100644
--- a/arch/powerpc/platforms/powernv/pci.h
+++ b/arch/powerpc/platforms/powernv/pci.h
@@ -66,15 +66,34 @@ struct pnv_ioda_pe {
 	struct list_head	list;
 };
 
+/* IOC dependent EEH operations */
+#ifdef CONFIG_EEH
+struct pnv_eeh_ops {
+	int (*post_init)(struct pci_controller *hose);
+	int (*set_option)(struct eeh_pe *pe, int option);
+	int (*get_state)(struct eeh_pe *pe, int *state);
+	int (*reset)(struct eeh_pe *pe, int option);
+	int (*get_log)(struct eeh_pe *pe, int severity,
+		       char *drv_log, unsigned long len);
+	int (*configure_bridge)(struct eeh_pe *pe);
+};
+#endif /* CONFIG_EEH */
+
 struct pnv_phb {
 	struct pci_controller	*hose;
 	enum pnv_phb_type	type;
 	enum pnv_phb_model	model;
+	u64			hub_id;
 	u64			opal_id;
 	void __iomem		*regs;
 	int			initialized;
 	spinlock_t		lock;
 
+#ifdef CONFIG_EEH
+	struct pnv_eeh_ops	*eeh_ops;
+	int			eeh_enabled;
+#endif
+
 #ifdef CONFIG_PCI_MSI
 	unsigned int		msi_base;
 	unsigned int		msi32_support;
@@ -150,6 +169,9 @@ struct pnv_phb {
 };
 
 extern struct pci_ops pnv_pci_ops;
+#ifdef CONFIG_EEH
+extern struct pnv_eeh_ops ioda_eeh_ops;
+#endif
 
 extern void pnv_pci_setup_iommu_table(struct iommu_table *tbl,
 				      void *tce_mem, u64 tce_size,
-- 
1.7.5.4

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 13/27] powerpc/eeh: I/O chip post initialization
  2013-06-05  7:34 [PATCH v3 00/27] EEH Support for PowerNV platform Gavin Shan
                   ` (11 preceding siblings ...)
  2013-06-05  7:34 ` [PATCH 12/27] powerpc/eeh: EEH backend for P7IOC Gavin Shan
@ 2013-06-05  7:34 ` Gavin Shan
  2013-06-05  7:34 ` [PATCH 14/27] powerpc/eeh: I/O chip EEH enable option Gavin Shan
                   ` (14 subsequent siblings)
  27 siblings, 0 replies; 43+ messages in thread
From: Gavin Shan @ 2013-06-05  7:34 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Gavin Shan

The post initialization (struct eeh_ops::post_init) is called after
the EEH probe is done. On the other hand, the EEH core post initialization
is designed to call platform and then I/O chip backend on PowerNV
platform.

The patch adds the backend for I/O chip to notify the platform
the specific PHB is ready to supply EEH service.

Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/powernv/eeh-ioda.c |   21 ++++++++++++++++++++-
 1 files changed, 20 insertions(+), 1 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/eeh-ioda.c b/arch/powerpc/platforms/powernv/eeh-ioda.c
index b344576..6640d4f 100644
--- a/arch/powerpc/platforms/powernv/eeh-ioda.c
+++ b/arch/powerpc/platforms/powernv/eeh-ioda.c
@@ -43,8 +43,27 @@
 #include "powernv.h"
 #include "pci.h"
 
+/**
+ * ioda_eeh_post_init - Chip dependent post initialization
+ * @hose: PCI controller
+ *
+ * The function will be called after eeh PEs and devices
+ * have been built. That means the EEH is ready to supply
+ * service with I/O cache.
+ */
+static int ioda_eeh_post_init(struct pci_controller *hose)
+{
+	struct pnv_phb *phb = hose->private_data;
+
+	/* FIXME: Enable it for PHB3 later */
+	if (phb->type == PNV_PHB_IODA1)
+		phb->eeh_enabled = 1;
+
+	return 0;
+}
+
 struct pnv_eeh_ops ioda_eeh_ops = {
-	.post_init		= NULL,
+	.post_init		= ioda_eeh_post_init,
 	.set_option		= NULL,
 	.get_state		= NULL,
 	.reset			= NULL,
-- 
1.7.5.4

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 14/27] powerpc/eeh: I/O chip EEH enable option
  2013-06-05  7:34 [PATCH v3 00/27] EEH Support for PowerNV platform Gavin Shan
                   ` (12 preceding siblings ...)
  2013-06-05  7:34 ` [PATCH 13/27] powerpc/eeh: I/O chip post initialization Gavin Shan
@ 2013-06-05  7:34 ` Gavin Shan
  2013-06-05  7:34 ` [PATCH 15/27] powerpc/eeh: I/O chip EEH state retrieval Gavin Shan
                   ` (13 subsequent siblings)
  27 siblings, 0 replies; 43+ messages in thread
From: Gavin Shan @ 2013-06-05  7:34 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Gavin Shan

The patch adds the backend to enable or disable EEH functionality
for the specified PE. The backend is also used to enable MMIO or
DMA path for the problematic PE. It's notable that all PEs on
PowerNV platform support EEH functionality by default, and we
disallow to disable EEH for the specific PE.

Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/powernv/eeh-ioda.c |   65 ++++++++++++++++++++++++++++-
 1 files changed, 64 insertions(+), 1 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/eeh-ioda.c b/arch/powerpc/platforms/powernv/eeh-ioda.c
index 6640d4f..e24622e 100644
--- a/arch/powerpc/platforms/powernv/eeh-ioda.c
+++ b/arch/powerpc/platforms/powernv/eeh-ioda.c
@@ -62,9 +62,72 @@ static int ioda_eeh_post_init(struct pci_controller *hose)
 	return 0;
 }
 
+/**
+ * ioda_eeh_set_option - Set EEH operation or I/O setting
+ * @pe: EEH PE
+ * @option: options
+ *
+ * Enable or disable EEH option for the indicated PE. The
+ * function also can be used to enable I/O or DMA for the
+ * PE.
+ */
+static int ioda_eeh_set_option(struct eeh_pe *pe, int option)
+{
+	s64 ret;
+	u32 pe_no;
+	struct pci_controller *hose = pe->phb;
+	struct pnv_phb *phb = hose->private_data;
+
+	/* Check on PE number */
+	if (pe->addr < 0 || pe->addr >= phb->ioda.total_pe) {
+		pr_err("%s: PE address %x out of range [0, %x] "
+		       "on PHB#%x\n",
+			__func__, pe->addr, phb->ioda.total_pe,
+			hose->global_number);
+		return -EINVAL;
+	}
+
+	pe_no = pe->addr;
+	switch (option) {
+	case EEH_OPT_DISABLE:
+		ret = -EEXIST;
+		break;
+	case EEH_OPT_ENABLE:
+		ret = 0;
+		break;
+	case EEH_OPT_THAW_MMIO:
+		ret = opal_pci_eeh_freeze_clear(phb->opal_id, pe_no,
+				OPAL_EEH_ACTION_CLEAR_FREEZE_MMIO);
+		if (ret) {
+			pr_warning("%s: Failed to enable MMIO for "
+				   "PHB#%x-PE#%x, err=%lld\n",
+				__func__, hose->global_number, pe_no, ret);
+			return -EIO;
+		}
+
+		break;
+	case EEH_OPT_THAW_DMA:
+		ret = opal_pci_eeh_freeze_clear(phb->opal_id, pe_no,
+				OPAL_EEH_ACTION_CLEAR_FREEZE_DMA);
+		if (ret) {
+			pr_warning("%s: Failed to enable DMA for "
+				   "PHB#%x-PE#%x, err=%lld\n",
+				__func__, hose->global_number, pe_no, ret);
+			return -EIO;
+		}
+
+		break;
+	default:
+		pr_warning("%s: Invalid option %d\n", __func__, option);
+		return -EINVAL;
+	}
+
+	return ret;
+}
+
 struct pnv_eeh_ops ioda_eeh_ops = {
 	.post_init		= ioda_eeh_post_init,
-	.set_option		= NULL,
+	.set_option		= ioda_eeh_set_option,
 	.get_state		= NULL,
 	.reset			= NULL,
 	.get_log		= NULL,
-- 
1.7.5.4

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 15/27] powerpc/eeh: I/O chip EEH state retrieval
  2013-06-05  7:34 [PATCH v3 00/27] EEH Support for PowerNV platform Gavin Shan
                   ` (13 preceding siblings ...)
  2013-06-05  7:34 ` [PATCH 14/27] powerpc/eeh: I/O chip EEH enable option Gavin Shan
@ 2013-06-05  7:34 ` Gavin Shan
  2013-06-11  7:37   ` Benjamin Herrenschmidt
  2013-06-05  7:34 ` [PATCH 16/27] powerpc/eeh: I/O chip PE reset Gavin Shan
                   ` (12 subsequent siblings)
  27 siblings, 1 reply; 43+ messages in thread
From: Gavin Shan @ 2013-06-05  7:34 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Gavin Shan

The patch adds I/O chip backend to retrieve the state for the
indicated PE. While the PE state is temperarily unavailable,
we return the default wait time (1000ms).

Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/powernv/eeh-ioda.c |  102 ++++++++++++++++++++++++++++-
 1 files changed, 101 insertions(+), 1 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/eeh-ioda.c b/arch/powerpc/platforms/powernv/eeh-ioda.c
index e24622e..3c72321 100644
--- a/arch/powerpc/platforms/powernv/eeh-ioda.c
+++ b/arch/powerpc/platforms/powernv/eeh-ioda.c
@@ -125,10 +125,110 @@ static int ioda_eeh_set_option(struct eeh_pe *pe, int option)
 	return ret;
 }
 
+/**
+ * ioda_eeh_get_state - Retrieve the state of PE
+ * @pe: EEH PE
+ * @state: return value
+ *
+ * The PE's state should be retrieved from the PEEV, PEST
+ * IODA tables. Since the OPAL has exported the function
+ * to do it, it'd better to use that.
+ */
+static int ioda_eeh_get_state(struct eeh_pe *pe, int *state)
+{
+	s64 ret = 0;
+	u8 fstate;
+	u16 pcierr;
+	u32 pe_no;
+	int result;
+	struct pci_controller *hose = pe->phb;
+	struct pnv_phb *phb = hose->private_data;
+
+	/*
+	 * Sanity check on PE address. The PHB PE address should
+	 * be zero.
+	 */
+	if (pe->addr < 0 || pe->addr >= phb->ioda.total_pe) {
+		pr_err("%s: PE address %x out of range [0, %x] "
+		       "on PHB#%x\n",
+			__func__, pe->addr, phb->ioda.total_pe,
+			hose->global_number);
+		return EEH_STATE_NOT_SUPPORT;
+	}
+
+	/* Retrieve PE status through OPAL */
+	pe_no = pe->addr;
+	ret = opal_pci_eeh_freeze_status(phb->opal_id, pe_no,
+			&fstate, &pcierr, NULL);
+	if (ret) {
+		pr_err("%s: Failed to get EEH status on "
+		       "PHB#%x-PE#%x\n, err=%lld\n",
+			__func__, hose->global_number, pe_no, ret);
+		return EEH_STATE_NOT_SUPPORT;
+	}
+
+	/* Check PHB status */
+	if (pe->type & EEH_PE_PHB) {
+		result = 0;
+		result &= ~EEH_STATE_RESET_ACTIVE;
+
+		if (pcierr != OPAL_EEH_PHB_ERROR) {
+			result |= EEH_STATE_MMIO_ACTIVE;
+			result |= EEH_STATE_DMA_ACTIVE;
+			result |= EEH_STATE_MMIO_ENABLED;
+			result |= EEH_STATE_DMA_ENABLED;
+		}
+
+		return result;
+	}
+
+	/* Parse result out */
+	result = 0;
+	switch (fstate) {
+	case OPAL_EEH_STOPPED_NOT_FROZEN:
+		result &= ~EEH_STATE_RESET_ACTIVE;
+		result |= EEH_STATE_MMIO_ACTIVE;
+		result |= EEH_STATE_DMA_ACTIVE;
+		result |= EEH_STATE_MMIO_ENABLED;
+		result |= EEH_STATE_DMA_ENABLED;
+		break;
+	case OPAL_EEH_STOPPED_MMIO_FREEZE:
+		result &= ~EEH_STATE_RESET_ACTIVE;
+		result |= EEH_STATE_DMA_ACTIVE;
+		result |= EEH_STATE_DMA_ENABLED;
+		break;
+	case OPAL_EEH_STOPPED_DMA_FREEZE:
+		result &= ~EEH_STATE_RESET_ACTIVE;
+		result |= EEH_STATE_MMIO_ACTIVE;
+		result |= EEH_STATE_MMIO_ENABLED;
+		break;
+	case OPAL_EEH_STOPPED_MMIO_DMA_FREEZE:
+		result &= ~EEH_STATE_RESET_ACTIVE;
+		break;
+	case OPAL_EEH_STOPPED_RESET:
+		result |= EEH_STATE_RESET_ACTIVE;
+		break;
+	case OPAL_EEH_STOPPED_TEMP_UNAVAIL:
+		result |= EEH_STATE_UNAVAILABLE;
+		if (state)
+			*state = 1000;
+		break;
+	case OPAL_EEH_STOPPED_PERM_UNAVAIL:
+		result |= EEH_STATE_NOT_SUPPORT;
+		break;
+	default:
+		pr_warning("%s: Unexpected EEH status 0x%x "
+			   "on PHB#%x-PE#%x\n",
+			__func__, fstate, hose->global_number, pe_no);
+	}
+
+	return result;
+}
+
 struct pnv_eeh_ops ioda_eeh_ops = {
 	.post_init		= ioda_eeh_post_init,
 	.set_option		= ioda_eeh_set_option,
-	.get_state		= NULL,
+	.get_state		= ioda_eeh_get_state,
 	.reset			= NULL,
 	.get_log		= NULL,
 	.configure_bridge	= NULL
-- 
1.7.5.4

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 16/27] powerpc/eeh: I/O chip PE reset
  2013-06-05  7:34 [PATCH v3 00/27] EEH Support for PowerNV platform Gavin Shan
                   ` (14 preceding siblings ...)
  2013-06-05  7:34 ` [PATCH 15/27] powerpc/eeh: I/O chip EEH state retrieval Gavin Shan
@ 2013-06-05  7:34 ` Gavin Shan
  2013-06-05  7:34 ` [PATCH 17/27] powerpc/eeh: I/O chip PE log and bridge setup Gavin Shan
                   ` (11 subsequent siblings)
  27 siblings, 0 replies; 43+ messages in thread
From: Gavin Shan @ 2013-06-05  7:34 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Gavin Shan

The patch adds the I/O chip backend to do PE reset. For now, we
focus on PCI bus dependent PE. If PHB PE has been put into error
state, the PHB will take complete reset. Besides, the root bridge
will take fundamental or hot reset accordingly if the indicated
PE locates at the toppest of PCI hierarchy tree. Otherwise, the
upstream p2p bridge will take hot reset.

Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/powernv/eeh-ioda.c |  233 ++++++++++++++++++++++++++++-
 1 files changed, 232 insertions(+), 1 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/eeh-ioda.c b/arch/powerpc/platforms/powernv/eeh-ioda.c
index 3c72321..e6c8c7f 100644
--- a/arch/powerpc/platforms/powernv/eeh-ioda.c
+++ b/arch/powerpc/platforms/powernv/eeh-ioda.c
@@ -225,11 +225,242 @@ static int ioda_eeh_get_state(struct eeh_pe *pe, int *state)
 	return result;
 }
 
+static int ioda_eeh_pe_clear(struct eeh_pe *pe)
+{
+	struct pci_controller *hose;
+	struct pnv_phb *phb;
+	u32 pe_no;
+	u8 fstate;
+	u16 pcierr;
+	s64 ret;
+
+	pe_no = pe->addr;
+	hose = pe->phb;
+	phb = pe->phb->private_data;
+
+	/* Clear the EEH error on the PE */
+	ret = opal_pci_eeh_freeze_clear(phb->opal_id,
+			pe_no, OPAL_EEH_ACTION_CLEAR_FREEZE_ALL);
+	if (ret) {
+		pr_err("%s: Failed to clear EEH error for "
+		       "PHB#%x-PE#%x, err=%lld\n",
+			__func__, hose->global_number, pe_no, ret);
+		return -EIO;
+	}
+
+	/*
+	 * Read the PE state back and verify that the frozen
+	 * state has been removed.
+	 */
+	ret = opal_pci_eeh_freeze_status(phb->opal_id, pe_no,
+			&fstate, &pcierr, NULL);
+	if (ret) {
+		pr_err("%s: Failed to get EEH status on "
+		       "PHB#%x-PE#%x\n, err=%lld\n",
+			__func__, hose->global_number, pe_no, ret);
+		return -EIO;
+	}
+	if (fstate != OPAL_EEH_STOPPED_NOT_FROZEN) {
+		pr_err("%s: Frozen state not cleared on "
+		       "PHB#%x-PE#%x, sts=%x\n",
+			__func__, hose->global_number, pe_no, fstate);
+		return -EIO;
+	}
+
+	return 0;
+}
+
+static s64 ioda_eeh_phb_poll(struct pnv_phb *phb)
+{
+	s64 rc = OPAL_HARDWARE;
+
+	while (1) {
+		rc = opal_pci_poll(phb->opal_id);
+		if (rc <= 0)
+			break;
+
+		msleep(rc);
+	}
+
+	return rc;
+}
+
+static int ioda_eeh_phb_reset(struct pci_controller *hose, int option)
+{
+	struct pnv_phb *phb = hose->private_data;
+	s64 rc = OPAL_HARDWARE;
+
+	pr_debug("%s: Reset PHB#%x, option=%d\n",
+		__func__, hose->global_number, option);
+
+	/* Issue PHB complete reset request */
+	if (option == EEH_RESET_FUNDAMENTAL ||
+	    option == EEH_RESET_HOT)
+		rc = opal_pci_reset(phb->opal_id,
+				OPAL_PHB_COMPLETE,
+				OPAL_ASSERT_RESET);
+	else if (option == EEH_RESET_DEACTIVATE)
+		rc = opal_pci_reset(phb->opal_id,
+				OPAL_PHB_COMPLETE,
+				OPAL_DEASSERT_RESET);
+	if (rc < 0)
+		goto out;
+
+	/*
+	 * Poll state of the PHB until the request is done
+	 * successfully.
+	 */
+	rc = ioda_eeh_phb_poll(phb);
+out:
+	if (rc != OPAL_SUCCESS)
+		return -EIO;
+
+	return 0;
+}
+
+static int ioda_eeh_root_reset(struct pci_controller *hose, int option)
+{
+	struct pnv_phb *phb = hose->private_data;
+	s64 rc = OPAL_SUCCESS;
+
+	pr_debug("%s: Reset PHB#%x, option=%d\n",
+		__func__, hose->global_number, option);
+
+	/*
+	 * During the reset deassert time, we needn't care
+	 * the reset scope because the firmware does nothing
+	 * for fundamental or hot reset during deassert phase.
+	 */
+	if (option == EEH_RESET_FUNDAMENTAL)
+		rc = opal_pci_reset(phb->opal_id,
+				OPAL_PCI_FUNDAMENTAL_RESET,
+				OPAL_ASSERT_RESET);
+	else if (option == EEH_RESET_HOT)
+		rc = opal_pci_reset(phb->opal_id,
+				OPAL_PCI_HOT_RESET,
+				OPAL_ASSERT_RESET);
+	else if (option == EEH_RESET_DEACTIVATE)
+		rc = opal_pci_reset(phb->opal_id,
+				OPAL_PCI_HOT_RESET,
+				OPAL_DEASSERT_RESET);
+	if (rc < 0)
+		goto out;
+
+	/* Poll state of the PHB until the request is done */
+	rc = ioda_eeh_phb_poll(phb);
+out:
+	if (rc != OPAL_SUCCESS)
+		return -EIO;
+
+	return 0;
+}
+
+static int ioda_eeh_bridge_reset(struct pci_controller *hose,
+		struct pci_dev *dev, int option)
+{
+	u16 ctrl;
+
+	pr_debug("%s: Reset device %04x:%02x:%02x.%01x with option %d\n",
+		__func__, hose->global_number, dev->bus->number,
+		PCI_SLOT(dev->devfn), PCI_FUNC(dev->devfn), option);
+
+	switch (option) {
+	case EEH_RESET_FUNDAMENTAL:
+	case EEH_RESET_HOT:
+		pci_read_config_word(dev, PCI_BRIDGE_CONTROL, &ctrl);
+		ctrl |= PCI_BRIDGE_CTL_BUS_RESET;
+		pci_write_config_word(dev, PCI_BRIDGE_CONTROL, ctrl);
+		break;
+	case EEH_RESET_DEACTIVATE:
+		pci_read_config_word(dev, PCI_BRIDGE_CONTROL, &ctrl);
+		ctrl &= ~PCI_BRIDGE_CTL_BUS_RESET;
+		pci_write_config_word(dev, PCI_BRIDGE_CONTROL, ctrl);
+		break;
+	}
+
+	return 0;
+}
+
+/**
+ * ioda_eeh_reset - Reset the indicated PE
+ * @pe: EEH PE
+ * @option: reset option
+ *
+ * Do reset on the indicated PE. For PCI bus sensitive PE,
+ * we need to reset the parent p2p bridge. The PHB has to
+ * be reinitialized if the p2p bridge is root bridge. For
+ * PCI device sensitive PE, we will try to reset the device
+ * through FLR. For now, we don't have OPAL APIs to do HARD
+ * reset yet, so all reset would be SOFT (HOT) reset.
+ */
+static int ioda_eeh_reset(struct eeh_pe *pe, int option)
+{
+	struct pci_controller *hose = pe->phb;
+	struct eeh_dev *edev;
+	struct pci_dev *dev;
+	int ret;
+
+	/*
+	 * Anyway, we have to clear the problematic state for the
+	 * corresponding PE. However, we needn't do it if the PE
+	 * is PHB associated. That means the PHB is having fatal
+	 * errors and it needs reset. Further more, the AIB interface
+	 * isn't reliable any more.
+	 */
+	if (!(pe->type & EEH_PE_PHB) &&
+	    (option == EEH_RESET_HOT ||
+	    option == EEH_RESET_FUNDAMENTAL)) {
+		ret = ioda_eeh_pe_clear(pe);
+		if (ret)
+			return -EIO;
+	}
+
+	/*
+	 * The rules applied to reset, either fundamental or hot reset:
+	 *
+	 * We always reset the direct upstream bridge of the PE. If the
+	 * direct upstream bridge isn't root bridge, we always take hot
+	 * reset no matter what option (fundamental or hot) is. Otherwise,
+	 * we should do the reset according to the required option.
+	 */
+	if (pe->type & EEH_PE_PHB) {
+		ret = ioda_eeh_phb_reset(hose, option);
+	} else {
+		if (pe->type & EEH_PE_DEVICE) {
+			/*
+			 * If it's device PE, we didn't refer to the parent
+			 * PCI bus yet. So we have to figure it out indirectly.
+			 */
+			edev = list_first_entry(&pe->edevs,
+					struct eeh_dev, list);
+			dev = eeh_dev_to_pci_dev(edev);
+			dev = dev->bus->self;
+		} else {
+			/*
+			 * If it's bus PE, the parent PCI bus is already there
+			 * and just pick it up.
+			 */
+			dev = pe->bus->self;
+		}
+
+		/*
+		 * Do reset based on the fact that the direct upstream bridge
+		 * is root bridge (port) or not.
+		 */
+		if (dev->bus->number == 0)
+			ret = ioda_eeh_root_reset(hose, option);
+		else
+			ret = ioda_eeh_bridge_reset(hose, dev, option);
+	}
+
+	return ret;
+}
+
 struct pnv_eeh_ops ioda_eeh_ops = {
 	.post_init		= ioda_eeh_post_init,
 	.set_option		= ioda_eeh_set_option,
 	.get_state		= ioda_eeh_get_state,
-	.reset			= NULL,
+	.reset			= ioda_eeh_reset,
 	.get_log		= NULL,
 	.configure_bridge	= NULL
 };
-- 
1.7.5.4

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 17/27] powerpc/eeh: I/O chip PE log and bridge setup
  2013-06-05  7:34 [PATCH v3 00/27] EEH Support for PowerNV platform Gavin Shan
                   ` (15 preceding siblings ...)
  2013-06-05  7:34 ` [PATCH 16/27] powerpc/eeh: I/O chip PE reset Gavin Shan
@ 2013-06-05  7:34 ` Gavin Shan
  2013-06-11  7:37   ` Benjamin Herrenschmidt
  2013-06-05  7:34 ` [PATCH 18/27] powerpc/eeh: PowerNV EEH backends Gavin Shan
                   ` (10 subsequent siblings)
  27 siblings, 1 reply; 43+ messages in thread
From: Gavin Shan @ 2013-06-05  7:34 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Gavin Shan

The patch adds backends to retrieve error log and configure p2p
bridges for the indicated PE.

Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/powernv/eeh-ioda.c |   57 ++++++++++++++++++++++++++++-
 1 files changed, 55 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/eeh-ioda.c b/arch/powerpc/platforms/powernv/eeh-ioda.c
index e6c8c7f..cbef2d5 100644
--- a/arch/powerpc/platforms/powernv/eeh-ioda.c
+++ b/arch/powerpc/platforms/powernv/eeh-ioda.c
@@ -456,11 +456,64 @@ static int ioda_eeh_reset(struct eeh_pe *pe, int option)
 	return ret;
 }
 
+/**
+ * ioda_eeh_get_log - Retrieve error log
+ * @pe: EEH PE
+ * @severity: Severity level of the log
+ * @drv_log: buffer to store the log
+ * @len: space of the log buffer
+ *
+ * The function is used to retrieve error log from P7IOC.
+ */
+static int ioda_eeh_get_log(struct eeh_pe *pe, int severity,
+		char *drv_log, unsigned long len)
+{
+	s64 ret;
+	unsigned long flags;
+	struct pci_controller *hose = pe->phb;
+	struct pnv_phb *phb = hose->private_data;
+
+	spin_lock_irqsave(&phb->lock, flags);
+
+	ret = opal_pci_get_phb_diag_data2(phb->opal_id,
+			phb->diag.blob, PNV_PCI_DIAG_BUF_SIZE);
+	if (ret) {
+		spin_unlock_irqrestore(&phb->lock, flags);
+		pr_warning("%s: Failed to retrieve log for PHB#%x-PE#%x\n",
+			__func__, hose->global_number, pe->addr);
+		return -EIO;
+	}
+
+	/*
+	 * FIXME: We probably need log the error in somewhere.
+	 * Lets make it up in future.
+	 */
+	/* pr_info("%s", phb->diag.blob); */
+
+	spin_unlock_irqrestore(&phb->lock, flags);
+
+	return 0;
+}
+
+/**
+ * ioda_eeh_configure_bridge - Configure the PCI bridges for the indicated PE
+ * @pe: EEH PE
+ *
+ * For particular PE, it might have included PCI bridges. In order
+ * to make the PE work properly, those PCI bridges should be configured
+ * correctly. However, we need do nothing on P7IOC since the reset
+ * function will do everything that should be covered by the function.
+ */
+static int ioda_eeh_configure_bridge(struct eeh_pe *pe)
+{
+	return 0;
+}
+
 struct pnv_eeh_ops ioda_eeh_ops = {
 	.post_init		= ioda_eeh_post_init,
 	.set_option		= ioda_eeh_set_option,
 	.get_state		= ioda_eeh_get_state,
 	.reset			= ioda_eeh_reset,
-	.get_log		= NULL,
-	.configure_bridge	= NULL
+	.get_log		= ioda_eeh_get_log,
+	.configure_bridge	= ioda_eeh_configure_bridge
 };
-- 
1.7.5.4

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 18/27] powerpc/eeh: PowerNV EEH backends
  2013-06-05  7:34 [PATCH v3 00/27] EEH Support for PowerNV platform Gavin Shan
                   ` (16 preceding siblings ...)
  2013-06-05  7:34 ` [PATCH 17/27] powerpc/eeh: I/O chip PE log and bridge setup Gavin Shan
@ 2013-06-05  7:34 ` Gavin Shan
  2013-06-05  7:34 ` [PATCH 19/27] powerpc/eeh: Initialization for PowerNV Gavin Shan
                   ` (9 subsequent siblings)
  27 siblings, 0 replies; 43+ messages in thread
From: Gavin Shan @ 2013-06-05  7:34 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Gavin Shan

The patch adds EEH backends for PowerNV platform. It's notable that
part of those EEH backends call to the I/O chip dependent backends.

Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/powernv/Makefile      |    2 +-
 arch/powerpc/platforms/powernv/eeh-powernv.c |  387 ++++++++++++++++++++++++++
 2 files changed, 388 insertions(+), 1 deletions(-)
 create mode 100644 arch/powerpc/platforms/powernv/eeh-powernv.c

diff --git a/arch/powerpc/platforms/powernv/Makefile b/arch/powerpc/platforms/powernv/Makefile
index 09bd0cb..7fe5951 100644
--- a/arch/powerpc/platforms/powernv/Makefile
+++ b/arch/powerpc/platforms/powernv/Makefile
@@ -3,4 +3,4 @@ obj-y			+= opal-rtc.o opal-nvram.o
 
 obj-$(CONFIG_SMP)	+= smp.o
 obj-$(CONFIG_PCI)	+= pci.o pci-p5ioc2.o pci-ioda.o
-obj-$(CONFIG_EEH)	+= eeh-ioda.o
+obj-$(CONFIG_EEH)	+= eeh-ioda.o eeh-powernv.o
diff --git a/arch/powerpc/platforms/powernv/eeh-powernv.c b/arch/powerpc/platforms/powernv/eeh-powernv.c
new file mode 100644
index 0000000..1530264
--- /dev/null
+++ b/arch/powerpc/platforms/powernv/eeh-powernv.c
@@ -0,0 +1,387 @@
+/*
+ * The file intends to implement the platform dependent EEH operations on
+ * powernv platform. Actually, the powernv was created in order to fully
+ * hypervisor support.
+ *
+ * Copyright Benjamin Herrenschmidt & Gavin Shan, IBM Corporation 2013.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307 USA
+ */
+
+#include <linux/atomic.h>
+#include <linux/delay.h>
+#include <linux/export.h>
+#include <linux/init.h>
+#include <linux/list.h>
+#include <linux/msi.h>
+#include <linux/of.h>
+#include <linux/pci.h>
+#include <linux/proc_fs.h>
+#include <linux/rbtree.h>
+#include <linux/sched.h>
+#include <linux/seq_file.h>
+#include <linux/spinlock.h>
+
+#include <asm/eeh.h>
+#include <asm/eeh_event.h>
+#include <asm/firmware.h>
+#include <asm/io.h>
+#include <asm/iommu.h>
+#include <asm/machdep.h>
+#include <asm/msi_bitmap.h>
+#include <asm/opal.h>
+#include <asm/ppc-pci.h>
+
+#include "powernv.h"
+#include "pci.h"
+
+/**
+ * powernv_eeh_init - EEH platform dependent initialization
+ *
+ * EEH platform dependent initialization on powernv
+ */
+static int powernv_eeh_init(void)
+{
+	/* We require OPALv3 */
+	if (!firmware_has_feature(FW_FEATURE_OPALv3)) {
+		pr_warning("%s: OPALv3 is required !\n", __func__);
+		return -EINVAL;
+	}
+
+	/* Set EEH probe mode */
+	eeh_probe_mode_set(EEH_PROBE_MODE_DEV);
+
+	return 0;
+}
+
+/**
+ * powernv_eeh_post_init - EEH platform dependent post initialization
+ *
+ * EEH platform dependent post initialization on powernv. When
+ * the function is called, the EEH PEs and devices should have
+ * been built. If the I/O cache staff has been built, EEH is
+ * ready to supply service.
+ */
+static int powernv_eeh_post_init(void)
+{
+	struct pci_controller *hose;
+	struct pnv_phb *phb;
+	int ret = 0;
+
+	list_for_each_entry(hose, &hose_list, list_node) {
+		phb = hose->private_data;
+
+		if (phb->eeh_ops && phb->eeh_ops->post_init) {
+			ret = phb->eeh_ops->post_init(hose);
+			if (ret)
+				break;
+		}
+	}
+
+	return ret;
+}
+
+/**
+ * powernv_eeh_dev_probe - Do probe on PCI device
+ * @dev: PCI device
+ * @flag: unused
+ *
+ * When EEH module is installed during system boot, all PCI devices
+ * are checked one by one to see if it supports EEH. The function
+ * is introduced for the purpose. By default, EEH has been enabled
+ * on all PCI devices. That's to say, we only need do necessary
+ * initialization on the corresponding eeh device and create PE
+ * accordingly.
+ *
+ * It's notable that's unsafe to retrieve the EEH device through
+ * the corresponding PCI device. During the PCI device hotplug, which
+ * was possiblly triggered by EEH core, the binding between EEH device
+ * and the PCI device isn't built yet.
+ */
+static int powernv_eeh_dev_probe(struct pci_dev *dev, void *flag)
+{
+	struct pci_controller *hose = pci_bus_to_host(dev->bus);
+	struct pnv_phb *phb = hose->private_data;
+	struct device_node *dn = pci_device_to_OF_node(dev);
+	struct eeh_dev *edev = of_node_to_eeh_dev(dn);
+
+	/*
+	 * When probing the root bridge, which doesn't have any
+	 * subordinate PCI devices. We don't have OF node for
+	 * the root bridge. So it's not reasonable to continue
+	 * the probing.
+	 */
+	if (!dn || !edev)
+		return 0;
+
+	/* Skip for PCI-ISA bridge */
+	if ((dev->class >> 8) == PCI_CLASS_BRIDGE_ISA)
+		return 0;
+
+	/* Initialize eeh device */
+	edev->class_code	= dev->class;
+	edev->mode		= 0;
+	edev->config_addr	= ((dev->bus->number << 8) | dev->devfn);
+	edev->pe_config_addr	= phb->bdfn_to_pe(phb, dev->bus, dev->devfn & 0xff);
+
+	/* Create PE */
+	eeh_add_to_parent_pe(edev);
+
+	/*
+	 * Enable EEH explicitly so that we will do EEH check
+	 * while accessing I/O stuff
+	 *
+	 * FIXME: Enable that for PHB3 later
+	 */
+	if (phb->type == PNV_PHB_IODA1)
+		eeh_subsystem_enabled = 1;
+
+	/* Save memory bars */
+	eeh_save_bars(edev);
+
+	return 0;
+}
+
+/**
+ * powernv_eeh_set_option - Initialize EEH or MMIO/DMA reenable
+ * @pe: EEH PE
+ * @option: operation to be issued
+ *
+ * The function is used to control the EEH functionality globally.
+ * Currently, following options are support according to PAPR:
+ * Enable EEH, Disable EEH, Enable MMIO and Enable DMA
+ */
+static int powernv_eeh_set_option(struct eeh_pe *pe, int option)
+{
+	struct pci_controller *hose = pe->phb;
+	struct pnv_phb *phb = hose->private_data;
+	int ret = -EEXIST;
+
+	/*
+	 * What we need do is pass it down for hardware
+	 * implementation to handle it.
+	 */
+	if (phb->eeh_ops && phb->eeh_ops->set_option)
+		ret = phb->eeh_ops->set_option(pe, option);
+
+	return ret;
+}
+
+/**
+ * powernv_eeh_get_pe_addr - Retrieve PE address
+ * @pe: EEH PE
+ *
+ * Retrieve the PE address according to the given tranditional
+ * PCI BDF (Bus/Device/Function) address.
+ */
+static int powernv_eeh_get_pe_addr(struct eeh_pe *pe)
+{
+	return pe->addr;
+}
+
+/**
+ * powernv_eeh_get_state - Retrieve PE state
+ * @pe: EEH PE
+ * @state: return value
+ *
+ * Retrieve the state of the specified PE. For IODA-compitable
+ * platform, it should be retrieved from IODA table. Therefore,
+ * we prefer passing down to hardware implementation to handle
+ * it.
+ */
+static int powernv_eeh_get_state(struct eeh_pe *pe, int *state)
+{
+	struct pci_controller *hose = pe->phb;
+	struct pnv_phb *phb = hose->private_data;
+	int ret = EEH_STATE_NOT_SUPPORT;
+
+	if (phb->eeh_ops && phb->eeh_ops->get_state)
+		ret = phb->eeh_ops->get_state(pe, state);
+
+	return ret;
+}
+
+/**
+ * powernv_eeh_reset - Reset the specified PE
+ * @pe: EEH PE
+ * @option: reset option
+ *
+ * Reset the specified PE
+ */
+static int powernv_eeh_reset(struct eeh_pe *pe, int option)
+{
+	struct pci_controller *hose = pe->phb;
+	struct pnv_phb *phb = hose->private_data;
+	int ret = -EEXIST;
+
+	if (phb->eeh_ops && phb->eeh_ops->reset)
+		ret = phb->eeh_ops->reset(pe, option);
+
+	return ret;
+}
+
+/**
+ * powernv_eeh_wait_state - Wait for PE state
+ * @pe: EEH PE
+ * @max_wait: maximal period in microsecond
+ *
+ * Wait for the state of associated PE. It might take some time
+ * to retrieve the PE's state.
+ */
+static int powernv_eeh_wait_state(struct eeh_pe *pe, int max_wait)
+{
+	int ret;
+	int mwait;
+
+	while (1) {
+		ret = powernv_eeh_get_state(pe, &mwait);
+
+		/*
+		 * If the PE's state is temporarily unavailable,
+		 * we have to wait for the specified time. Otherwise,
+		 * the PE's state will be returned immediately.
+		 */
+		if (ret != EEH_STATE_UNAVAILABLE)
+			return ret;
+
+		max_wait -= mwait;
+		msleep(mwait);
+	}
+
+	return EEH_STATE_NOT_SUPPORT;
+}
+
+/**
+ * powernv_eeh_get_log - Retrieve error log
+ * @pe: EEH PE
+ * @severity: temporary or permanent error log
+ * @drv_log: driver log to be combined with retrieved error log
+ * @len: length of driver log
+ *
+ * Retrieve the temporary or permanent error from the PE.
+ */
+static int powernv_eeh_get_log(struct eeh_pe *pe, int severity,
+			char *drv_log, unsigned long len)
+{
+	struct pci_controller *hose = pe->phb;
+	struct pnv_phb *phb = hose->private_data;
+	int ret = -EEXIST;
+
+	if (phb->eeh_ops && phb->eeh_ops->get_log)
+		ret = phb->eeh_ops->get_log(pe, severity, drv_log, len);
+
+	return ret;
+}
+
+/**
+ * powernv_eeh_configure_bridge - Configure PCI bridges in the indicated PE
+ * @pe: EEH PE
+ *
+ * The function will be called to reconfigure the bridges included
+ * in the specified PE so that the mulfunctional PE would be recovered
+ * again.
+ */
+static int powernv_eeh_configure_bridge(struct eeh_pe *pe)
+{
+	struct pci_controller *hose = pe->phb;
+	struct pnv_phb *phb = hose->private_data;
+	int ret = 0;
+
+	if (phb->eeh_ops && phb->eeh_ops->configure_bridge)
+		ret = phb->eeh_ops->configure_bridge(pe);
+
+	return ret;
+}
+
+/**
+ * powernv_eeh_read_config - Read PCI config space
+ * @dn: device node
+ * @where: PCI address
+ * @size: size to read
+ * @val: return value
+ *
+ * Read config space from the speicifed device
+ */
+static int powernv_eeh_read_config(struct device_node *dn, int where,
+				   int size, u32 *val)
+{
+	struct eeh_dev *edev = of_node_to_eeh_dev(dn);
+	struct pci_dev *dev = eeh_dev_to_pci_dev(edev);
+	struct pci_controller *hose = edev->phb;
+
+	return hose->ops->read(dev->bus, dev->devfn, where, size, val);
+}
+
+/**
+ * powernv_eeh_write_config - Write PCI config space
+ * @dn: device node
+ * @where: PCI address
+ * @size: size to write
+ * @val: value to be written
+ *
+ * Write config space to the specified device
+ */
+static int powernv_eeh_write_config(struct device_node *dn, int where,
+				    int size, u32 val)
+{
+	struct eeh_dev *edev = of_node_to_eeh_dev(dn);
+	struct pci_dev *dev = eeh_dev_to_pci_dev(edev);
+	struct pci_controller *hose = edev->phb;
+
+	hose = pci_bus_to_host(dev->bus);
+
+	return hose->ops->write(dev->bus, dev->devfn, where, size, val);
+}
+
+static struct eeh_ops powernv_eeh_ops = {
+	.name                   = "powernv",
+	.init                   = powernv_eeh_init,
+	.post_init              = powernv_eeh_post_init,
+	.of_probe               = NULL,
+	.dev_probe              = powernv_eeh_dev_probe,
+	.set_option             = powernv_eeh_set_option,
+	.get_pe_addr            = powernv_eeh_get_pe_addr,
+	.get_state              = powernv_eeh_get_state,
+	.reset                  = powernv_eeh_reset,
+	.wait_state             = powernv_eeh_wait_state,
+	.get_log                = powernv_eeh_get_log,
+	.configure_bridge       = powernv_eeh_configure_bridge,
+	.read_config            = powernv_eeh_read_config,
+	.write_config           = powernv_eeh_write_config
+};
+
+/**
+ * eeh_powernv_init - Register platform dependent EEH operations
+ *
+ * EEH initialization on powernv platform. This function should be
+ * called before any EEH related functions.
+ */
+static int __init eeh_powernv_init(void)
+{
+	int ret = -EINVAL;
+
+	if (!machine_is(powernv))
+		return ret;
+
+	ret = eeh_ops_register(&powernv_eeh_ops);
+	if (!ret)
+		pr_info("EEH: PowerNV platform initialized\n");
+	else
+		pr_info("EEH: Failed to initialize PowerNV platform (%d)\n", ret);
+
+	return ret;
+}
+
+early_initcall(eeh_powernv_init);
-- 
1.7.5.4

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 19/27] powerpc/eeh: Initialization for PowerNV
  2013-06-05  7:34 [PATCH v3 00/27] EEH Support for PowerNV platform Gavin Shan
                   ` (17 preceding siblings ...)
  2013-06-05  7:34 ` [PATCH 18/27] powerpc/eeh: PowerNV EEH backends Gavin Shan
@ 2013-06-05  7:34 ` Gavin Shan
  2013-06-05  7:34 ` [PATCH 20/27] powerpc/eeh: Enable EEH check for config access Gavin Shan
                   ` (8 subsequent siblings)
  27 siblings, 0 replies; 43+ messages in thread
From: Gavin Shan @ 2013-06-05  7:34 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Gavin Shan

The patch initializes EEH for PowerNV platform. Because the OPAL
APIs requires HUB ID, we need trace that through struct pnv_phb.

Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/powernv/pci-ioda.c   |   16 +++++++++++++---
 arch/powerpc/platforms/powernv/pci-p5ioc2.c |    6 ++++--
 2 files changed, 17 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index 9c9d15e..48b0940 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -973,6 +973,11 @@ static void pnv_pci_ioda_fixup(void)
 	pnv_pci_ioda_setup_PEs();
 	pnv_pci_ioda_setup_seg();
 	pnv_pci_ioda_setup_DMA();
+
+#ifdef CONFIG_EEH
+	eeh_addr_cache_build();
+	eeh_init();
+#endif
 }
 
 /*
@@ -1049,7 +1054,8 @@ static void pnv_pci_ioda_shutdown(struct pnv_phb *phb)
 		       OPAL_ASSERT_RESET);
 }
 
-void __init pnv_pci_init_ioda_phb(struct device_node *np, int ioda_type)
+void __init pnv_pci_init_ioda_phb(struct device_node *np,
+				  u64 hub_id, int ioda_type)
 {
 	struct pci_controller *hose;
 	static int primary = 1;
@@ -1087,6 +1093,7 @@ void __init pnv_pci_init_ioda_phb(struct device_node *np, int ioda_type)
 	hose->first_busno = 0;
 	hose->last_busno = 0xff;
 	hose->private_data = phb;
+	phb->hub_id = hub_id;
 	phb->opal_id = phb_id;
 	phb->type = ioda_type;
 
@@ -1172,6 +1179,9 @@ void __init pnv_pci_init_ioda_phb(struct device_node *np, int ioda_type)
 		phb->ioda.io_size, phb->ioda.io_segsize);
 
 	phb->hose->ops = &pnv_pci_ops;
+#ifdef CONFIG_EEH
+	phb->eeh_ops = &ioda_eeh_ops;
+#endif
 
 	/* Setup RID -> PE mapping function */
 	phb->bdfn_to_pe = pnv_ioda_bdfn_to_pe;
@@ -1212,7 +1222,7 @@ void __init pnv_pci_init_ioda_phb(struct device_node *np, int ioda_type)
 
 void pnv_pci_init_ioda2_phb(struct device_node *np)
 {
-	pnv_pci_init_ioda_phb(np, PNV_PHB_IODA2);
+	pnv_pci_init_ioda_phb(np, 0, PNV_PHB_IODA2);
 }
 
 void __init pnv_pci_init_ioda_hub(struct device_node *np)
@@ -1235,6 +1245,6 @@ void __init pnv_pci_init_ioda_hub(struct device_node *np)
 	for_each_child_of_node(np, phbn) {
 		/* Look for IODA1 PHBs */
 		if (of_device_is_compatible(phbn, "ibm,ioda-phb"))
-			pnv_pci_init_ioda_phb(phbn, PNV_PHB_IODA1);
+			pnv_pci_init_ioda_phb(phbn, hub_id, PNV_PHB_IODA1);
 	}
 }
diff --git a/arch/powerpc/platforms/powernv/pci-p5ioc2.c b/arch/powerpc/platforms/powernv/pci-p5ioc2.c
index 92b37a0..ae72616 100644
--- a/arch/powerpc/platforms/powernv/pci-p5ioc2.c
+++ b/arch/powerpc/platforms/powernv/pci-p5ioc2.c
@@ -92,7 +92,7 @@ static void pnv_pci_p5ioc2_dma_dev_setup(struct pnv_phb *phb,
 	set_iommu_table_base(&pdev->dev, &phb->p5ioc2.iommu_table);
 }
 
-static void __init pnv_pci_init_p5ioc2_phb(struct device_node *np,
+static void __init pnv_pci_init_p5ioc2_phb(struct device_node *np, u64 hub_id,
 					   void *tce_mem, u64 tce_size)
 {
 	struct pnv_phb *phb;
@@ -133,6 +133,7 @@ static void __init pnv_pci_init_p5ioc2_phb(struct device_node *np,
 	phb->hose->first_busno = 0;
 	phb->hose->last_busno = 0xff;
 	phb->hose->private_data = phb;
+	phb->hub_id = hub_id;
 	phb->opal_id = phb_id;
 	phb->type = PNV_PHB_P5IOC2;
 	phb->model = PNV_PHB_MODEL_P5IOC2;
@@ -226,7 +227,8 @@ void __init pnv_pci_init_p5ioc2_hub(struct device_node *np)
 	for_each_child_of_node(np, phbn) {
 		if (of_device_is_compatible(phbn, "ibm,p5ioc2-pcix") ||
 		    of_device_is_compatible(phbn, "ibm,p5ioc2-pciex")) {
-			pnv_pci_init_p5ioc2_phb(phbn, tce_mem, tce_per_phb);
+			pnv_pci_init_p5ioc2_phb(phbn, hub_id,
+					tce_mem, tce_per_phb);
 			tce_mem += tce_per_phb;
 		}
 	}
-- 
1.7.5.4

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 20/27] powerpc/eeh: Enable EEH check for config access
  2013-06-05  7:34 [PATCH v3 00/27] EEH Support for PowerNV platform Gavin Shan
                   ` (18 preceding siblings ...)
  2013-06-05  7:34 ` [PATCH 19/27] powerpc/eeh: Initialization for PowerNV Gavin Shan
@ 2013-06-05  7:34 ` Gavin Shan
  2013-06-05  7:34 ` [PATCH 21/27] powerpc/eeh: Process interrupts caused by EEH Gavin Shan
                   ` (7 subsequent siblings)
  27 siblings, 0 replies; 43+ messages in thread
From: Gavin Shan @ 2013-06-05  7:34 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Gavin Shan

The patch enables EEH check and let EEH core to process the EEH
errors for PowerNV platform while accessing config space. Originally,
the implementation already had mechanism to check EEH errors and
tried to recover from them. However, we never let EEH core to handle
the EEH errors.

Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/powernv/pci.c |   41 +++++++++++++++++++++++++++++++++-
 1 files changed, 40 insertions(+), 1 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci.c b/arch/powerpc/platforms/powernv/pci.c
index 20af220..5d787d6 100644
--- a/arch/powerpc/platforms/powernv/pci.c
+++ b/arch/powerpc/platforms/powernv/pci.c
@@ -32,6 +32,8 @@
 #include <asm/iommu.h>
 #include <asm/tce.h>
 #include <asm/firmware.h>
+#include <asm/eeh_event.h>
+#include <asm/eeh.h>
 
 #include "powernv.h"
 #include "pci.h"
@@ -259,6 +261,10 @@ static int pnv_pci_read_config(struct pci_bus *bus,
 {
 	struct pci_controller *hose = pci_bus_to_host(bus);
 	struct pnv_phb *phb = hose->private_data;
+#ifdef CONFIG_EEH
+	struct device_node *busdn, *dn;
+	struct eeh_pe *phb_pe = NULL;
+#endif
 	u32 bdfn = (((uint64_t)bus->number) << 8) | devfn;
 	s64 rc;
 
@@ -291,8 +297,35 @@ static int pnv_pci_read_config(struct pci_bus *bus,
 	cfg_dbg("pnv_pci_read_config bus: %x devfn: %x +%x/%x -> %08x\n",
 		bus->number, devfn, where, size, *val);
 
-	/* Check if the PHB got frozen due to an error (no response) */
+	/*
+	 * Check if the specified PE has been put into frozen
+	 * state. On the other hand, we needn't do that while
+	 * the PHB has been put into frozen state because of
+	 * PHB-fatal errors.
+	 */
+#ifdef CONFIG_EEH
+	phb_pe = eeh_phb_pe_get(hose);
+	if (phb_pe && (phb_pe->state & EEH_PE_ISOLATED))
+		return PCIBIOS_SUCCESSFUL;
+
+	if (phb->eeh_enabled) {
+		if (*val == EEH_IO_ERROR_VALUE(size)) {
+			busdn = pci_bus_to_OF_node(bus);
+			for (dn = busdn->child; dn; dn = dn->sibling) {
+				struct pci_dn *pdn = PCI_DN(dn);
+
+				if (pdn && pdn->devfn == devfn &&
+					eeh_dev_check_failure(of_node_to_eeh_dev(dn),
+							EEH_EVENT_NORMAL))
+					return PCIBIOS_DEVICE_NOT_FOUND;
+			}
+		}
+	} else {
+		pnv_pci_config_check_eeh(phb, bus, bdfn);
+	}
+#else
 	pnv_pci_config_check_eeh(phb, bus, bdfn);
+#endif
 
 	return PCIBIOS_SUCCESSFUL;
 }
@@ -323,8 +356,14 @@ static int pnv_pci_write_config(struct pci_bus *bus,
 	default:
 		return PCIBIOS_FUNC_NOT_SUPPORTED;
 	}
+
 	/* Check if the PHB got frozen due to an error (no response) */
+#ifdef CONFIG_EEH
+	if (!phb->eeh_enabled)
+		pnv_pci_config_check_eeh(phb, bus, bdfn);
+#else
 	pnv_pci_config_check_eeh(phb, bus, bdfn);
+#endif
 
 	return PCIBIOS_SUCCESSFUL;
 }
-- 
1.7.5.4

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 21/27] powerpc/eeh: Process interrupts caused by EEH
  2013-06-05  7:34 [PATCH v3 00/27] EEH Support for PowerNV platform Gavin Shan
                   ` (19 preceding siblings ...)
  2013-06-05  7:34 ` [PATCH 20/27] powerpc/eeh: Enable EEH check for config access Gavin Shan
@ 2013-06-05  7:34 ` Gavin Shan
  2013-06-11  8:13   ` Benjamin Herrenschmidt
  2013-06-05  7:34 ` [PATCH 22/27] powerpc/eeh: Allow to check fenced PHB proactively Gavin Shan
                   ` (6 subsequent siblings)
  27 siblings, 1 reply; 43+ messages in thread
From: Gavin Shan @ 2013-06-05  7:34 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Gavin Shan

On PowerNV platform, the EEH event is produced either by detect
on accessing config or I/O registers, or by interrupts dedicated
for EEH report. The patch adds support to process the interrupts
dedicated for EEH report.

Firstly, the kernel thread will be waken up to process incoming
interrupt. The PHBs will be scanned one by one to process all
existing EEH errors. Besides, There're mulple EEH errors that can
be reported from interrupts and we have differentiated actions
against them:

* If the IOC is dead, we will simply panic the system.
* If the PHB is dead, we also simply panic the system.
* If the PHB is fenced, EEH event will be sent to EEH core and
  the fenced PHB is expected to be resetted completely.
* If specific PE has been put into frozen state, EEH event will
  be sent to EEH core so that the PE will be resetted.
* If the error is informational one, we just output the related
  registers for debugging purpose and no more action will be
  taken.

Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/eeh.h             |    6 +
 arch/powerpc/platforms/powernv/Makefile    |    2 +-
 arch/powerpc/platforms/powernv/pci-err.c   |  466 ++++++++++++++++++++++++++++
 arch/powerpc/platforms/pseries/eeh_event.c |    8 +
 4 files changed, 481 insertions(+), 1 deletions(-)
 create mode 100644 arch/powerpc/platforms/powernv/pci-err.c

diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
index d1fd5d4..68ac408 100644
--- a/arch/powerpc/include/asm/eeh.h
+++ b/arch/powerpc/include/asm/eeh.h
@@ -209,6 +209,12 @@ void eeh_add_device_tree_late(struct pci_bus *);
 void eeh_add_sysfs_files(struct pci_bus *);
 void eeh_remove_bus_device(struct pci_dev *, int);
 
+#ifdef CONFIG_PPC_POWERNV
+void pci_err_release(void);
+#else
+static inline void pci_err_release(void) { }
+#endif
+
 /**
  * EEH_POSSIBLE_ERROR() -- test for possible MMIO failure.
  *
diff --git a/arch/powerpc/platforms/powernv/Makefile b/arch/powerpc/platforms/powernv/Makefile
index 7fe5951..912fa7c 100644
--- a/arch/powerpc/platforms/powernv/Makefile
+++ b/arch/powerpc/platforms/powernv/Makefile
@@ -3,4 +3,4 @@ obj-y			+= opal-rtc.o opal-nvram.o
 
 obj-$(CONFIG_SMP)	+= smp.o
 obj-$(CONFIG_PCI)	+= pci.o pci-p5ioc2.o pci-ioda.o
-obj-$(CONFIG_EEH)	+= eeh-ioda.o eeh-powernv.o
+obj-$(CONFIG_EEH)	+= pci-err.o eeh-ioda.o eeh-powernv.o
diff --git a/arch/powerpc/platforms/powernv/pci-err.c b/arch/powerpc/platforms/powernv/pci-err.c
new file mode 100644
index 0000000..bfc95c6
--- /dev/null
+++ b/arch/powerpc/platforms/powernv/pci-err.c
@@ -0,0 +1,466 @@
+/*
+ * The file instends to handle those interrupts dedicated for error
+ * detection from IOC chips. Currently, we only support P7IOC and
+ * need support more IOC chips in the future. The interrupts have
+ * been exported to hypervisor through "opal-interrupts" of "ibm,opal"
+ * OF node. When one of them comes in, the hypervisor simply turns
+ * to the firmware and expects the appropriate events returned. In
+ * turn, we will format one message and queue that in order to process
+ * it at later point.
+ *
+ * On the other hand, we need maintain information about the states
+ * of IO HUBs and their associated PHBs. The information would be
+ * shared by hypervisor and guests in future. While hypervisor or guests
+ * accessing IO HUBs, PHBs and PEs, the state should be checked and
+ * return approriate results. That would benefit EEH RTAS emulation in
+ * hypervisor as well.
+ *
+ * Copyright Benjamin Herrenschmidt & Gavin Shan, IBM Corporation 2013.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#include <linux/kernel.h>
+#include <linux/pci.h>
+#include <linux/delay.h>
+#include <linux/string.h>
+#include <linux/semaphore.h>
+#include <linux/init.h>
+#include <linux/bootmem.h>
+#include <linux/irq.h>
+#include <linux/io.h>
+#include <linux/kthread.h>
+#include <linux/msi.h>
+
+#include <asm/firmware.h>
+#include <asm/sections.h>
+#include <asm/io.h>
+#include <asm/prom.h>
+#include <asm/pci-bridge.h>
+#include <asm/machdep.h>
+#include <asm/msi_bitmap.h>
+#include <asm/ppc-pci.h>
+#include <asm/opal.h>
+#include <asm/iommu.h>
+#include <asm/tce.h>
+#include <asm/eeh_event.h>
+#include <asm/eeh.h>
+
+#include "powernv.h"
+#include "pci.h"
+
+/* Debugging option */
+#ifdef PCI_ERR_DEBUG_ON
+#define PCI_ERR_DBG(args...)	pr_info(args)
+#else
+#define PCI_ERR_DBG(args...)
+#endif
+
+static struct task_struct *pci_err_thread;
+static struct semaphore pci_err_int_sem;
+static struct semaphore pci_err_seq_sem;
+static char *pci_err_diag;
+
+static void pci_err_take(void)
+{
+	down(&pci_err_seq_sem);
+}
+
+/**
+ * pci_err_release - Enable error report for sending events
+ *
+ * We're hanlding the EEH event one by one. Each time, there only has
+ * one EEH event caused by error IRQ. The function is called to enable
+ * error report in order to send more EEH events.
+ */
+void pci_err_release(void)
+{
+	up(&pci_err_seq_sem);
+}
+
+/*
+ * When we get global interrupts (e.g. P7IOC RGC), PCI error happens
+ * in critical component of the IOC or PHB. For the formal case, the
+ * firmware just returns OPAL_PCI_ERR_CLASS_HUB and we needn't proceed.
+ * For the late case, we probably need reset one particular PHB. For
+ * that, we're doing is to send EEH event to the toppset PE of that
+ * problematic PHB so that the PHB can be reset by the EEH core.
+ */
+static int pci_err_check_phb(struct pci_controller *hose)
+{
+	struct eeh_pe *phb_pe;
+
+	/* Find the PHB PE */
+	phb_pe = eeh_phb_pe_get(hose);
+	if (!phb_pe) {
+		pr_debug("%s Can't find PE for PHB#%d\n",
+			__func__, hose->global_number);
+		return -EEXIST;
+	}
+	PCI_ERR_DBG("PCI_ERR: PHB#%d PE found\n",
+		hose->global_number);
+
+	/*
+	 * Fence the PHB and send one event to EEH core
+	 * for further processing. We have to fence the
+	 * PHB here because the EEH core always return
+	 * normal state for PHB PE, so we can't do it
+	 * through EEH core.
+	 */
+	if (!(phb_pe->state & EEH_PE_ISOLATED)) {
+		PCI_ERR_DBG("PCI_ERR: Fence PHB#%x and send event "
+			    "to EEH core\n", hose->global_number);
+		eeh_pe_state_mark(phb_pe, EEH_PE_ISOLATED);
+		WARN(1, "EEH: PHB failure detected\n");
+		eeh_send_failure_event(phb_pe, EEH_EVENT_INT);
+	} else {
+		pci_err_release();
+	}
+
+	return 0;
+}
+
+/*
+ * When we get interrupts from PHB, there are probablly some PEs that
+ * have been put into frozen state. What we need do is sent one message
+ * to the EEH device, no matter which one it is, so that the EEH core
+ * can check it out and do PE reset accordingly.
+ */
+static int pci_err_check_pe(struct pci_controller *hose, u16 pe_no)
+{
+	struct eeh_pe *phb_pe, *pe;
+	struct eeh_dev dev, *edev;
+
+	/* Find the PHB PE */
+	phb_pe = eeh_phb_pe_get(hose);
+	if (!phb_pe) {
+		pr_warning("%s Can't find PE for PHB#%d\n",
+			__func__, hose->global_number);
+		return -EEXIST;
+	}
+	PCI_ERR_DBG("PCI_ERR: PHB#%d PE found\n",
+		hose->global_number);
+
+	/*
+	 * If the PHB has been put into fenced state, we
+	 * needn't send the duplicate event because the
+	 * whole PHB is going to take reset.
+	 */
+	if (phb_pe->state & EEH_PE_ISOLATED)
+		return 0;
+
+	/* Find the PE according to PE# */
+	memset(&dev, 0, sizeof(struct eeh_dev));
+	dev.phb = hose;
+	dev.pe_config_addr = pe_no;
+	pe = eeh_pe_get(&dev);
+	if (!pe) {
+		pr_debug("%s: Can't find PE for PHB#%x - PE#%x\n",
+			__func__, hose->global_number, pe_no);
+		return -EEXIST;
+	}
+	PCI_ERR_DBG("PCI_ERR: PE (%x) found for PHB#%x - PE#%x\n",
+		pe->addr, hose->global_number, pe_no);
+
+	/*
+	 * It doesn't matter which EEH device to get
+	 * the message. Just pick up the one on the
+	 * toppest position.
+	 */
+	edev = list_first_entry(&pe->edevs, struct eeh_dev, list);
+	if (!edev) {
+		pr_err("%s: No EEH devices hooked on PHB#%x - PE#%x\n",
+			__func__, hose->global_number, pe_no);
+		return -EEXIST;
+	}
+	PCI_ERR_DBG("PCI_ERR: First EEH device found on PHB#%x - PE#%x\n",
+		hose->global_number, pe_no);
+
+	if (eeh_dev_check_failure(edev, EEH_EVENT_INT) != 1)
+		pci_err_release();
+
+	return 0;
+}
+
+static void pci_err_hub_diag_common(struct OpalIoP7IOCErrorData *data)
+{
+	/* GEM */
+	pr_info("  GEM XFIR:        %016llx\n", data->gemXfir);
+	pr_info("  GEM RFIR:        %016llx\n", data->gemRfir);
+	pr_info("  GEM RIRQFIR:     %016llx\n", data->gemRirqfir);
+	pr_info("  GEM Mask:        %016llx\n", data->gemMask);
+	pr_info("  GEM RWOF:        %016llx\n", data->gemRwof);
+
+	/* LEM */
+	pr_info("  LEM FIR:         %016llx\n", data->lemFir);
+	pr_info("  LEM Error Mask:  %016llx\n", data->lemErrMask);
+	pr_info("  LEM Action 0:    %016llx\n", data->lemAction0);
+	pr_info("  LEM Action 1:    %016llx\n", data->lemAction1);
+	pr_info("  LEM WOF:         %016llx\n", data->lemWof);
+}
+
+static void pci_err_hub_diag_data(struct pci_controller *hose)
+{
+	struct pnv_phb *phb = hose->private_data;
+	struct OpalIoP7IOCErrorData *data;
+	long ret;
+
+	data = (struct OpalIoP7IOCErrorData *)pci_err_diag;
+	ret = opal_pci_get_hub_diag_data(phb->hub_id, data, PAGE_SIZE);
+	if (ret != OPAL_SUCCESS) {
+		pr_warning("%s: Failed to get HUB#%llx diag-data, ret=%ld\n",
+			__func__, phb->hub_id, ret);
+		return;
+	}
+
+	/* Check the error type */
+	if (data->type <= OPAL_P7IOC_DIAG_TYPE_NONE ||
+	    data->type >= OPAL_P7IOC_DIAG_TYPE_LAST) {
+		pr_warning("%s: Invalid type of HUB#%llx diag-data (%d)\n",
+			__func__, phb->hub_id, data->type);
+		return;
+	}
+
+	switch (data->type) {
+	case OPAL_P7IOC_DIAG_TYPE_RGC:
+		pr_info("P7IOC diag-data for RGC\n\n");
+		pci_err_hub_diag_common(data);
+		pr_info("  RGC Status:      %016llx\n", data->rgc.rgcStatus);
+		pr_info("  RGC LDCP:        %016llx\n", data->rgc.rgcLdcp);
+		break;
+	case OPAL_P7IOC_DIAG_TYPE_BI:
+		pr_info("P7IOC diag-data for BI %s\n\n",
+			data->bi.biDownbound ? "Downbound" : "Upbound");
+		pci_err_hub_diag_common(data);
+		pr_info("  BI LDCP 0:       %016llx\n", data->bi.biLdcp0);
+		pr_info("  BI LDCP 1:       %016llx\n", data->bi.biLdcp1);
+		pr_info("  BI LDCP 2:       %016llx\n", data->bi.biLdcp2);
+		pr_info("  BI Fence Status: %016llx\n", data->bi.biFenceStatus);
+		break;
+	case OPAL_P7IOC_DIAG_TYPE_CI:
+		pr_info("P7IOC diag-data for CI Port %d\\nn",
+			data->ci.ciPort);
+		pci_err_hub_diag_common(data);
+		pr_info("  CI Port Status:  %016llx\n", data->ci.ciPortStatus);
+		pr_info("  CI Port LDCP:    %016llx\n", data->ci.ciPortLdcp);
+		break;
+	case OPAL_P7IOC_DIAG_TYPE_MISC:
+		pr_info("P7IOC diag-data for MISC\n\n");
+		pci_err_hub_diag_common(data);
+		break;
+	case OPAL_P7IOC_DIAG_TYPE_I2C:
+		pr_info("P7IOC diag-data for I2C\n\n");
+		pci_err_hub_diag_common(data);
+		break;
+	}
+}
+
+static void pci_err_phb_diag_data(struct pci_controller *hose)
+{
+	struct pnv_phb *phb = hose->private_data;
+	struct OpalIoP7IOCPhbErrorData *data;
+	int i;
+	long ret;
+
+	data = (struct OpalIoP7IOCPhbErrorData *)pci_err_diag;
+	ret = opal_pci_get_phb_diag_data2(phb->opal_id, data, PAGE_SIZE);
+	if (ret != OPAL_SUCCESS) {
+		pr_warning("%s: Failed to get diag-data for PHB#%x, ret=%ld\n",
+			__func__, hose->global_number, ret);
+		return;
+	}
+
+	pr_info("PHB#%x Diag-data\n\n", hose->global_number);
+	pr_info("  brdgCtl:              %08x\n", data->brdgCtl);
+
+	pr_info("  portStatusReg:        %08x\n", data->portStatusReg);
+	pr_info("  rootCmplxStatus:      %08x\n", data->rootCmplxStatus);
+	pr_info("  busAgentStatus:       %08x\n", data->busAgentStatus);
+
+	pr_info("  deviceStatus:         %08x\n", data->deviceStatus);
+	pr_info("  slotStatus:           %08x\n", data->slotStatus);
+	pr_info("  linkStatus:           %08x\n", data->linkStatus);
+	pr_info("  devCmdStatus:         %08x\n", data->devCmdStatus);
+	pr_info("  devSecStatus:         %08x\n", data->devSecStatus);
+
+	pr_info("  rootErrorStatus:      %08x\n", data->rootErrorStatus);
+	pr_info("  uncorrErrorStatus:    %08x\n", data->uncorrErrorStatus);
+	pr_info("  corrErrorStatus:      %08x\n", data->corrErrorStatus);
+	pr_info("  tlpHdr1:              %08x\n", data->tlpHdr1);
+	pr_info("  tlpHdr2:              %08x\n", data->tlpHdr2);
+	pr_info("  tlpHdr3:              %08x\n", data->tlpHdr3);
+	pr_info("  tlpHdr4:              %08x\n", data->tlpHdr4);
+	pr_info("  sourceId:             %08x\n", data->sourceId);
+
+	pr_info("  errorClass:           %016llx\n", data->errorClass);
+	pr_info("  correlator:           %016llx\n", data->correlator);
+	pr_info("  p7iocPlssr:           %016llx\n", data->p7iocPlssr);
+	pr_info("  p7iocCsr:             %016llx\n", data->p7iocCsr);
+	pr_info("  lemFir:               %016llx\n", data->lemFir);
+	pr_info("  lemErrorMask:         %016llx\n", data->lemErrorMask);
+	pr_info("  lemWOF:               %016llx\n", data->lemWOF);
+	pr_info("  phbErrorStatus:       %016llx\n", data->phbErrorStatus);
+	pr_info("  phbFirstErrorStatus:  %016llx\n", data->phbFirstErrorStatus);
+	pr_info("  phbErrorLog0:         %016llx\n", data->phbErrorLog0);
+	pr_info("  phbErrorLog1:         %016llx\n", data->phbErrorLog1);
+	pr_info("  mmioErrorStatus:      %016llx\n", data->mmioErrorStatus);
+	pr_info("  mmioFirstErrorStatus: %016llx\n", data->mmioFirstErrorStatus);
+	pr_info("  mmioErrorLog0:        %016llx\n", data->mmioErrorLog0);
+	pr_info("  mmioErrorLog1:        %016llx\n", data->mmioErrorLog1);
+	pr_info("  dma0ErrorStatus:      %016llx\n", data->dma0ErrorStatus);
+	pr_info("  dma0FirstErrorStatus: %016llx\n", data->dma0FirstErrorStatus);
+	pr_info("  dma0ErrorLog0:        %016llx\n", data->dma0ErrorLog0);
+	pr_info("  dma0ErrorLog1:        %016llx\n", data->dma0ErrorLog1);
+	pr_info("  dma1ErrorStatus:      %016llx\n", data->dma1ErrorStatus);
+	pr_info("  dma1FirstErrorStatus: %016llx\n", data->dma1FirstErrorStatus);
+	pr_info("  dma1ErrorLog0:        %016llx\n", data->dma1ErrorLog0);
+	pr_info("  dma1ErrorLog1:        %016llx\n", data->dma1ErrorLog1);
+
+	for (i = 0; i < OPAL_P7IOC_NUM_PEST_REGS; i++) {
+		if ((data->pestA[i] >> 63) == 0 &&
+		    (data->pestB[i] >> 63) == 0)
+			continue;
+
+		pr_info("  PE[%3d] PESTA:        %016llx\n", i, data->pestA[i]);
+		pr_info("          PESTB:        %016llx\n", data->pestB[i]);
+	}
+}
+
+/*
+ * Process PCI errors from IOC, PHB, or PE. Here's the list
+ * of expected error types and their severities, as well as
+ * the corresponding action.
+ *
+ * Type                        Severity                Action
+ * OPAL_EEH_ERROR_IOC  OPAL_EEH_SEV_IOC_DEAD   panic
+ * OPAL_EEH_ERROR_IOC  OPAL_EEH_SEV_INF        diag_data
+ * OPAL_EEH_ERROR_PHB  OPAL_EEH_SEV_PHB_DEAD   panic
+ * OPAL_EEH_ERROR_PHB  OPAL_EEH_SEV_PHB_FENCED eeh
+ * OPAL_EEH_ERROR_PHB  OPAL_EEH_SEV_INF        diag_data
+ * OPAL_EEH_ERROR_PE   OPAL_EEH_SEV_PE_ER      eeh
+ */
+static void pci_err_process(struct pci_controller *hose,
+			u16 err_type, u16 severity, u16 pe_no)
+{
+	PCI_ERR_DBG("PCI_ERR: Process error (%d, %d, %d) on PHB#%x\n",
+		err_type, severity, pe_no, hose->global_number);
+
+	switch (err_type) {
+	case OPAL_EEH_IOC_ERROR:
+		if (severity == OPAL_EEH_SEV_IOC_DEAD)
+			panic("Dead IOC of PHB#%x", hose->global_number);
+		else if (severity == OPAL_EEH_SEV_INF) {
+			pci_err_hub_diag_data(hose);
+			pci_err_release();
+		}
+
+		break;
+	case OPAL_EEH_PHB_ERROR:
+		if (severity == OPAL_EEH_SEV_PHB_DEAD)
+			panic("Dead PHB#%x", hose->global_number);
+		else if (severity == OPAL_EEH_SEV_PHB_FENCED)
+			pci_err_check_phb(hose);
+		else if (severity == OPAL_EEH_SEV_INF) {
+			pci_err_phb_diag_data(hose);
+			pci_err_release();
+		}
+
+		break;
+	case OPAL_EEH_PE_ERROR:
+		pci_err_check_pe(hose, pe_no);
+		break;
+	}
+}
+
+static int pci_err_handler(void *dummy)
+{
+	struct pnv_phb *phb;
+	struct pci_controller *hose, *tmp;
+	u64 frozen_pe_no;
+	u16 err_type, severity;
+	long ret;
+
+	while (!kthread_should_stop()) {
+		down(&pci_err_int_sem);
+		PCI_ERR_DBG("PCI_ERR: Get PCI error semaphore\n");
+
+		list_for_each_entry_safe(hose, tmp, &hose_list, list_node) {
+			phb = hose->private_data;
+restart:
+			pci_err_take();
+			ret = opal_pci_next_error(phb->opal_id,
+					&frozen_pe_no, &err_type, &severity);
+
+			/* If OPAL API returns error, we needn't proceed */
+			if (ret != OPAL_SUCCESS) {
+				PCI_ERR_DBG("PCI_ERR: Invalid return value on "
+					    "PHB#%x (0x%lx) from opal_pci_next_error",
+					    hose->global_number, ret);
+				pci_err_release();
+				continue;
+			}
+
+			/* If the PHB doesn't have error, stop processing */
+			if (err_type == OPAL_EEH_NO_ERROR ||
+			    severity == OPAL_EEH_SEV_NO_ERROR) {
+				PCI_ERR_DBG("PCI_ERR: No error found on PHB#%x\n",
+					hose->global_number);
+				pci_err_release();
+				continue;
+			}
+
+			/*
+			 * Process the error until there're no pending
+			 * errors on the specific PHB.
+			 */
+			pci_err_process(hose, err_type, severity, frozen_pe_no);
+			goto restart;
+		}
+	}
+
+	return 0;
+}
+
+/*
+ * pci_err_init - Initialize PCI error handling component
+ *
+ * It should be done before OPAL interrupts got registered because
+ * that depends on this.
+ */
+static int __init pci_err_init(void)
+{
+	int ret = 0;
+
+	if (!firmware_has_feature(FW_FEATURE_OPALv3)) {
+		pr_err("%s: FW_FEATURE_OPALv3 required!\n",
+			__func__);
+		return -EINVAL;
+	}
+
+	pci_err_diag = (char *)__get_free_page(GFP_KERNEL|__GFP_ZERO);
+	if (!pci_err_diag) {
+		pr_err("%s: Failed to alloc memory for diag data\n",
+			__func__);
+		return -ENOMEM;
+	}
+
+	/* Initialize semaphore */
+	sema_init(&pci_err_int_sem, 0);
+	sema_init(&pci_err_seq_sem, 1);
+
+	/* Start kthread */
+	pci_err_thread = kthread_run(pci_err_handler, NULL, "PCI_ERR");
+	if (IS_ERR(pci_err_thread)) {
+		ret = PTR_ERR(pci_err_thread);
+		pr_err("%s: Failed to start kthread, ret=%d\n",
+			__func__, ret);
+	}
+
+	free_page((unsigned long)pci_err_diag);
+	return ret;
+}
+
+arch_initcall(pci_err_init);
diff --git a/arch/powerpc/platforms/pseries/eeh_event.c b/arch/powerpc/platforms/pseries/eeh_event.c
index 1f86b80..e4c636e 100644
--- a/arch/powerpc/platforms/pseries/eeh_event.c
+++ b/arch/powerpc/platforms/pseries/eeh_event.c
@@ -84,6 +84,14 @@ static int eeh_event_handler(void * dummy)
 	eeh_handle_event(pe);
 	eeh_pe_state_clear(pe, EEH_PE_RECOVERING);
 
+	/*
+	 * If it's the event caused by error reporting IRQ,
+	 * we need release the module so that precedent events
+	 * could be fired.
+	 */
+	if (event->flag & EEH_EVENT_INT)
+		pci_err_release();
+
 	kfree(event);
 	mutex_unlock(&eeh_event_mutex);
 
-- 
1.7.5.4

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 22/27] powerpc/eeh: Allow to check fenced PHB proactively
  2013-06-05  7:34 [PATCH v3 00/27] EEH Support for PowerNV platform Gavin Shan
                   ` (20 preceding siblings ...)
  2013-06-05  7:34 ` [PATCH 21/27] powerpc/eeh: Process interrupts caused by EEH Gavin Shan
@ 2013-06-05  7:34 ` Gavin Shan
  2013-06-05  7:34 ` [PATCH 23/27] powernv/opal: Notifier for OPAL events Gavin Shan
                   ` (5 subsequent siblings)
  27 siblings, 0 replies; 43+ messages in thread
From: Gavin Shan @ 2013-06-05  7:34 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Gavin Shan

It's meaningless to handle frozen PE if we already had fenced PHB.
The patch intends to check the PHB state before checking PE. If the
PHB has been put into fenced state, we need take care of that firstly.

Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/pseries/eeh.c |   61 ++++++++++++++++++++++++++++++++++
 1 files changed, 61 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/eeh.c b/arch/powerpc/platforms/pseries/eeh.c
index a42b410..1daff1e 100644
--- a/arch/powerpc/platforms/pseries/eeh.c
+++ b/arch/powerpc/platforms/pseries/eeh.c
@@ -269,6 +269,59 @@ static inline unsigned long eeh_token_to_phys(unsigned long token)
 	return pa | (token & (PAGE_SIZE-1));
 }
 
+/*
+ * On PowerNV platform, we might already have fenced PHB there.
+ * For that case, it's meaningless to recover frozen PE. Intead,
+ * We have to handle fenced PHB firstly.
+ */
+static int eeh_phb_check_failure(struct eeh_pe *pe, int flag)
+{
+	struct eeh_pe *phb_pe;
+	unsigned long flags;
+	int ret;
+
+	if (!eeh_probe_mode_dev())
+		return -EPERM;
+
+	/* Find the PHB PE */
+	raw_spin_lock_irqsave(&confirm_error_lock, flags);
+	phb_pe = eeh_phb_pe_get(pe->phb);
+	if (!phb_pe) {
+		pr_warning("%s Can't find PE for PHB#%d\n",
+			__func__, pe->phb->global_number);
+		ret = -EEXIST;
+		goto out;
+	}
+
+	/* If the PHB has been in problematic state */
+	if (phb_pe->state & EEH_PE_ISOLATED) {
+		ret = 2;
+		goto out;
+	}
+
+	/* Check PHB state */
+	ret = eeh_ops->get_state(phb_pe, NULL);
+	if ((ret < 0) ||
+	    (ret == EEH_STATE_NOT_SUPPORT) ||
+	    (ret & (EEH_STATE_MMIO_ACTIVE | EEH_STATE_DMA_ACTIVE)) ==
+	    (EEH_STATE_MMIO_ACTIVE | EEH_STATE_DMA_ACTIVE)) {
+		ret = 0;
+		goto out;
+	}
+
+	/* Isolate the PHB and send event */
+	eeh_pe_state_mark(phb_pe, EEH_PE_ISOLATED);
+	raw_spin_unlock_irqrestore(&confirm_error_lock, flags);
+	eeh_send_failure_event(phb_pe, flag);
+
+	WARN(1, "EEH: PHB failure detected\n");
+
+	return 1;
+out:
+	raw_spin_unlock_irqrestore(&confirm_error_lock, flags);
+	return ret;
+}
+
 /**
  * eeh_dev_check_failure - Check if all 1's data is due to EEH slot freeze
  * @edev: eeh device
@@ -320,6 +373,14 @@ int eeh_dev_check_failure(struct eeh_dev *edev, int flag)
 		return 0;
 	}
 
+	/*
+	 * On PowerNV platform, we might already have fenced PHB
+	 * there and we need take care of that firstly.
+	 */
+	ret = eeh_phb_check_failure(pe, flag);
+	if (ret > 0)
+		return ret;
+
 	/* If we already have a pending isolation event for this
 	 * slot, we know it's bad already, we don't need to check.
 	 * Do this checking under a lock; as multiple PCI devices
-- 
1.7.5.4

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 23/27] powernv/opal: Notifier for OPAL events
  2013-06-05  7:34 [PATCH v3 00/27] EEH Support for PowerNV platform Gavin Shan
                   ` (21 preceding siblings ...)
  2013-06-05  7:34 ` [PATCH 22/27] powerpc/eeh: Allow to check fenced PHB proactively Gavin Shan
@ 2013-06-05  7:34 ` Gavin Shan
  2013-06-12  0:32   ` Benjamin Herrenschmidt
  2013-06-05  7:34 ` [PATCH 24/27] powernv/opal: Disable OPAL notifier upon poweroff Gavin Shan
                   ` (4 subsequent siblings)
  27 siblings, 1 reply; 43+ messages in thread
From: Gavin Shan @ 2013-06-05  7:34 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Gavin Shan

The patch intends to implement the notifier for variable OPAL events.
It's notable that the notifier can be disabled dynamically. Also, the
notifier could be fired upon incoming OPAL interrupts, or enabling
the OPAL notifier.

Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/opal.h       |    3 +
 arch/powerpc/platforms/powernv/opal.c |   79 ++++++++++++++++++++++++++++++++-
 2 files changed, 81 insertions(+), 1 deletions(-)

diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
index 2880797..64e7c84 100644
--- a/arch/powerpc/include/asm/opal.h
+++ b/arch/powerpc/include/asm/opal.h
@@ -644,6 +644,9 @@ extern void hvc_opal_init_early(void);
 extern int early_init_dt_scan_opal(unsigned long node, const char *uname,
 				   int depth, void *data);
 
+extern int opal_notifier_register(uint64_t mask, void (*cb)(uint64_t));
+extern void opal_notifier_enable(bool enable);
+
 extern int opal_get_chars(uint32_t vtermno, char *buf, int count);
 extern int opal_put_chars(uint32_t vtermno, const char *buf, int total_len);
 
diff --git a/arch/powerpc/platforms/powernv/opal.c b/arch/powerpc/platforms/powernv/opal.c
index 628c564..9bbbf93 100644
--- a/arch/powerpc/platforms/powernv/opal.c
+++ b/arch/powerpc/platforms/powernv/opal.c
@@ -26,11 +26,20 @@ struct opal {
 	u64 entry;
 } opal;
 
+struct opal_cb {
+	struct list_head list;
+	uint64_t mask;
+	void (*cb)(uint64_t);
+};
+
 static struct device_node *opal_node;
 static DEFINE_SPINLOCK(opal_write_lock);
 extern u64 opal_mc_secondary_handler[];
 static unsigned int *opal_irqs;
 static unsigned int opal_irq_count;
+static LIST_HEAD(opal_notifier);
+static DEFINE_SPINLOCK(opal_notifier_lock);
+static atomic_t opal_notifier_hold = ATOMIC_INIT(0);
 
 int __init early_init_dt_scan_opal(unsigned long node,
 				   const char *uname, int depth, void *data)
@@ -95,6 +104,74 @@ static int __init opal_register_exception_handlers(void)
 
 early_initcall(opal_register_exception_handlers);
 
+int opal_notifier_register(uint64_t mask, void (*cb)(uint64_t))
+{
+	unsigned long flags;
+	struct opal_cb *p, *tmp;
+
+	if (!mask || !cb) {
+		pr_warning("%s: Invalid argument (%llx, %p)!\n",
+			__func__, mask, cb);
+		return -EINVAL;
+	}
+
+	p = kzalloc(sizeof(*p), GFP_KERNEL);
+	if (!p) {
+		pr_warning("%s: Out of memory (%llx, %p)!\n",
+			__func__, mask, cb);
+		return -ENOMEM;
+	}
+	p->mask = mask;
+	p->cb   = cb;
+
+	spin_lock_irqsave(&opal_notifier_lock, flags);
+	list_for_each_entry(tmp, &opal_notifier, list) {
+		if (tmp->cb == cb || tmp->mask & mask) {
+			pr_warning("%s: Duplicate evnet handler (%llx, %p)\n",
+				__func__, tmp->mask, tmp->cb);
+			spin_unlock_irqrestore(&opal_notifier_lock, flags);
+			kfree(p);
+			return -EEXIST;
+		}
+	}
+
+	list_add_tail(&p->list, &opal_notifier);
+	spin_unlock_irqrestore(&opal_notifier_lock, flags);
+
+	return 0;
+}
+
+static void opal_do_notifier(uint64_t events)
+{
+	struct opal_cb *tmp;
+
+	if (atomic_read(&opal_notifier_hold))
+		return;
+	if (!events)
+		return;
+
+	list_for_each_entry(tmp, &opal_notifier, list) {
+		if (events & tmp->mask)
+			tmp->cb(events & tmp->mask);
+	}
+}
+
+void opal_notifier_enable(bool enable)
+{
+	int64_t rc;
+	uint64_t evt = 0;
+
+	if (enable) {
+		atomic_set(&opal_notifier_hold, 0);
+
+		/* Process pending events */
+		rc = opal_poll_events(&evt);
+		if (rc == OPAL_SUCCESS && evt)
+			opal_do_notifier(evt);
+	} else
+		atomic_set(&opal_notifier_hold, 1);
+}
+
 int opal_get_chars(uint32_t vtermno, char *buf, int count)
 {
 	s64 len, rc;
@@ -297,7 +374,7 @@ static irqreturn_t opal_interrupt(int irq, void *data)
 
 	opal_handle_interrupt(virq_to_hw(irq), &events);
 
-	/* XXX TODO: Do something with the events */
+	opal_do_notifier(events);
 
 	return IRQ_HANDLED;
 }
-- 
1.7.5.4

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 24/27] powernv/opal: Disable OPAL notifier upon poweroff
  2013-06-05  7:34 [PATCH v3 00/27] EEH Support for PowerNV platform Gavin Shan
                   ` (22 preceding siblings ...)
  2013-06-05  7:34 ` [PATCH 23/27] powernv/opal: Notifier for OPAL events Gavin Shan
@ 2013-06-05  7:34 ` Gavin Shan
  2013-06-05  7:34 ` [PATCH 25/27] powerpc/eeh: Register OPAL notifier for PCI error Gavin Shan
                   ` (3 subsequent siblings)
  27 siblings, 0 replies; 43+ messages in thread
From: Gavin Shan @ 2013-06-05  7:34 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Gavin Shan

While we're restarting or powering off the system, we needn't
the OPAL notifier any more. So just to disable that.

Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/powernv/setup.c |    4 ++++
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/setup.c b/arch/powerpc/platforms/powernv/setup.c
index d4459bf..e51675c 100644
--- a/arch/powerpc/platforms/powernv/setup.c
+++ b/arch/powerpc/platforms/powernv/setup.c
@@ -93,6 +93,8 @@ static void  __noreturn pnv_restart(char *cmd)
 {
 	long rc = OPAL_BUSY;
 
+	opal_notifier_enable(false);
+
 	while (rc == OPAL_BUSY || rc == OPAL_BUSY_EVENT) {
 		rc = opal_cec_reboot();
 		if (rc == OPAL_BUSY_EVENT)
@@ -108,6 +110,8 @@ static void __noreturn pnv_power_off(void)
 {
 	long rc = OPAL_BUSY;
 
+	opal_notifier_enable(false);
+
 	while (rc == OPAL_BUSY || rc == OPAL_BUSY_EVENT) {
 		rc = opal_cec_power_down(0);
 		if (rc == OPAL_BUSY_EVENT)
-- 
1.7.5.4

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 25/27] powerpc/eeh: Register OPAL notifier for PCI error
  2013-06-05  7:34 [PATCH v3 00/27] EEH Support for PowerNV platform Gavin Shan
                   ` (23 preceding siblings ...)
  2013-06-05  7:34 ` [PATCH 24/27] powernv/opal: Disable OPAL notifier upon poweroff Gavin Shan
@ 2013-06-05  7:34 ` Gavin Shan
  2013-06-05  7:34 ` [PATCH 26/27] powerpc/powernv: Debugfs directory for PHB Gavin Shan
                   ` (2 subsequent siblings)
  27 siblings, 0 replies; 43+ messages in thread
From: Gavin Shan @ 2013-06-05  7:34 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Gavin Shan

The patch intends to register OPAL event notifier and process the
PCI errors from firmware. If we have pending PCI errors, the kthread
will be invoke to handle that in turn.

Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/powernv/pci-err.c |   15 +++++++++++++++
 1 files changed, 15 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-err.c b/arch/powerpc/platforms/powernv/pci-err.c
index bfc95c6..d77dd61 100644
--- a/arch/powerpc/platforms/powernv/pci-err.c
+++ b/arch/powerpc/platforms/powernv/pci-err.c
@@ -64,6 +64,12 @@ static struct semaphore pci_err_int_sem;
 static struct semaphore pci_err_seq_sem;
 static char *pci_err_diag;
 
+static void pci_err_event(u64 event)
+{
+	/* Notify kthread to process error */
+	up(&pci_err_int_sem);
+}
+
 static void pci_err_take(void)
 {
 	down(&pci_err_seq_sem);
@@ -451,6 +457,14 @@ static int __init pci_err_init(void)
 	sema_init(&pci_err_int_sem, 0);
 	sema_init(&pci_err_seq_sem, 1);
 
+	/* Register OPAL event notifier */
+	ret = opal_notifier_register(OPAL_EVENT_PCI_ERROR, pci_err_event);
+	if (ret) {
+		pr_err("%s: Failed to register OPAL notifier, rc=%d\n",
+			__func__, ret);
+		goto out;
+	}
+
 	/* Start kthread */
 	pci_err_thread = kthread_run(pci_err_handler, NULL, "PCI_ERR");
 	if (IS_ERR(pci_err_thread)) {
@@ -459,6 +473,7 @@ static int __init pci_err_init(void)
 			__func__, ret);
 	}
 
+out:
 	free_page((unsigned long)pci_err_diag);
 	return ret;
 }
-- 
1.7.5.4

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 26/27] powerpc/powernv: Debugfs directory for PHB
  2013-06-05  7:34 [PATCH v3 00/27] EEH Support for PowerNV platform Gavin Shan
                   ` (24 preceding siblings ...)
  2013-06-05  7:34 ` [PATCH 25/27] powerpc/eeh: Register OPAL notifier for PCI error Gavin Shan
@ 2013-06-05  7:34 ` Gavin Shan
  2013-06-05  7:34 ` [PATCH 27/27] powerpc/eeh: Debugfs for error injection Gavin Shan
  2013-06-11  7:46 ` [PATCH v3 00/27] EEH Support for PowerNV platform Benjamin Herrenschmidt
  27 siblings, 0 replies; 43+ messages in thread
From: Gavin Shan @ 2013-06-05  7:34 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Gavin Shan

The patch creates one debugfs directory ("powerpc/PCIxxxx") for
each PHB so that we can hook EEH error injection debugfs entry
there in proceeding patch.

Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/powernv/pci-ioda.c |   22 ++++++++++++++++++++++
 arch/powerpc/platforms/powernv/pci.h      |    4 ++++
 2 files changed, 26 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index 48b0940..0d9d302 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -13,6 +13,7 @@
 
 #include <linux/kernel.h>
 #include <linux/pci.h>
+#include <linux/debugfs.h>
 #include <linux/delay.h>
 #include <linux/string.h>
 #include <linux/init.h>
@@ -968,12 +969,33 @@ static void pnv_pci_ioda_setup_DMA(void)
 	}
 }
 
+static void pnv_pci_ioda_create_dbgfs(void)
+{
+#ifdef CONFIG_DEBUG_FS
+	struct pci_controller *hose, *tmp;
+	struct pnv_phb *phb;
+	char name[16];
+
+	list_for_each_entry_safe(hose, tmp, &hose_list, list_node) {
+		phb = hose->private_data;
+
+		sprintf(name, "PCI%04x", hose->global_number);
+		phb->dbgfs = debugfs_create_dir(name, powerpc_debugfs_root);
+		if (!phb->dbgfs)
+			pr_warning("%s: Error on creating debugfs on PHB#%x\n",
+				__func__, hose->global_number);
+	}
+#endif /* CONFIG_DEBUG_FS */
+}
+
 static void pnv_pci_ioda_fixup(void)
 {
 	pnv_pci_ioda_setup_PEs();
 	pnv_pci_ioda_setup_seg();
 	pnv_pci_ioda_setup_DMA();
 
+	pnv_pci_ioda_create_dbgfs();
+
 #ifdef CONFIG_EEH
 	eeh_addr_cache_build();
 	eeh_init();
diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
index 1770188..3418a8e 100644
--- a/arch/powerpc/platforms/powernv/pci.h
+++ b/arch/powerpc/platforms/powernv/pci.h
@@ -94,6 +94,10 @@ struct pnv_phb {
 	int			eeh_enabled;
 #endif
 
+#ifdef CONFIG_DEBUG_FS
+	struct dentry		*dbgfs;
+#endif
+
 #ifdef CONFIG_PCI_MSI
 	unsigned int		msi_base;
 	unsigned int		msi32_support;
-- 
1.7.5.4

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 27/27] powerpc/eeh: Debugfs for error injection
  2013-06-05  7:34 [PATCH v3 00/27] EEH Support for PowerNV platform Gavin Shan
                   ` (25 preceding siblings ...)
  2013-06-05  7:34 ` [PATCH 26/27] powerpc/powernv: Debugfs directory for PHB Gavin Shan
@ 2013-06-05  7:34 ` Gavin Shan
  2013-06-11  7:46 ` [PATCH v3 00/27] EEH Support for PowerNV platform Benjamin Herrenschmidt
  27 siblings, 0 replies; 43+ messages in thread
From: Gavin Shan @ 2013-06-05  7:34 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Gavin Shan

The patch creates debugfs entries (powerpc/PCIxxxx/err_injct) for
injecting EEH errors for testing purpose.

Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/powernv/eeh-ioda.c |   33 ++++++++++++++++++++++++++++-
 1 files changed, 32 insertions(+), 1 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/eeh-ioda.c b/arch/powerpc/platforms/powernv/eeh-ioda.c
index cbef2d5..6b13405 100644
--- a/arch/powerpc/platforms/powernv/eeh-ioda.c
+++ b/arch/powerpc/platforms/powernv/eeh-ioda.c
@@ -21,6 +21,7 @@
  */
 
 #include <linux/bootmem.h>
+#include <linux/debugfs.h>
 #include <linux/delay.h>
 #include <linux/init.h>
 #include <linux/io.h>
@@ -43,6 +44,29 @@
 #include "powernv.h"
 #include "pci.h"
 
+#ifdef CONFIG_DEBUG_FS
+static int ioda_eeh_dbgfs_set(void *data, u64 val)
+{
+	struct pci_controller *hose = data;
+	struct pnv_phb *phb = hose->private_data;
+
+	out_be64(phb->regs + 0xD10, val);
+	return 0;
+}
+
+static int ioda_eeh_dbgfs_get(void *data, u64 *val)
+{
+	struct pci_controller *hose = data;
+	struct pnv_phb *phb = hose->private_data;
+
+	*val = in_be64(phb->regs + 0xD10);
+	return 0;
+}
+
+DEFINE_SIMPLE_ATTRIBUTE(ioda_eeh_dbgfs_ops, ioda_eeh_dbgfs_get,
+			ioda_eeh_dbgfs_set, "0x%llx\n");
+#endif /* CONFIG_DEBUG_FS */
+
 /**
  * ioda_eeh_post_init - Chip dependent post initialization
  * @hose: PCI controller
@@ -56,8 +80,15 @@ static int ioda_eeh_post_init(struct pci_controller *hose)
 	struct pnv_phb *phb = hose->private_data;
 
 	/* FIXME: Enable it for PHB3 later */
-	if (phb->type == PNV_PHB_IODA1)
+	if (phb->type == PNV_PHB_IODA1) {
+#ifdef CONFIG_DEBUG_FS
+		if (phb->dbgfs)
+			debugfs_create_file("err_injct", 0600,
+				phb->dbgfs, hose, &ioda_eeh_dbgfs_ops);
+#endif
+
 		phb->eeh_enabled = 1;
+	}
 
 	return 0;
 }
-- 
1.7.5.4

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* Re: [PATCH 15/27] powerpc/eeh: I/O chip EEH state retrieval
  2013-06-05  7:34 ` [PATCH 15/27] powerpc/eeh: I/O chip EEH state retrieval Gavin Shan
@ 2013-06-11  7:37   ` Benjamin Herrenschmidt
  2013-06-12  3:32     ` Gavin Shan
  0 siblings, 1 reply; 43+ messages in thread
From: Benjamin Herrenschmidt @ 2013-06-11  7:37 UTC (permalink / raw)
  To: Gavin Shan; +Cc: linuxppc-dev

On Wed, 2013-06-05 at 15:34 +0800, Gavin Shan wrote:
> The patch adds I/O chip backend to retrieve the state for the
> indicated PE. While the PE state is temperarily unavailable,
> we return the default wait time (1000ms).
> 
> Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
> ---
>  arch/powerpc/platforms/powernv/eeh-ioda.c |  102 ++++++++++++++++++++++++++++-
>  1 files changed, 101 insertions(+), 1 deletions(-)
> 
> diff --git a/arch/powerpc/platforms/powernv/eeh-ioda.c b/arch/powerpc/platforms/powernv/eeh-ioda.c
> index e24622e..3c72321 100644
> --- a/arch/powerpc/platforms/powernv/eeh-ioda.c
> +++ b/arch/powerpc/platforms/powernv/eeh-ioda.c
> @@ -125,10 +125,110 @@ static int ioda_eeh_set_option(struct eeh_pe *pe, int option)
>  	return ret;
>  }
>  
> +/**
> + * ioda_eeh_get_state - Retrieve the state of PE
> + * @pe: EEH PE
> + * @state: return value
> + *
> + * The PE's state should be retrieved from the PEEV, PEST
> + * IODA tables. Since the OPAL has exported the function
> + * to do it, it'd better to use that.
> + */
> +static int ioda_eeh_get_state(struct eeh_pe *pe, int *state)
> +{

So everywhere you have this "state" argument which isn't a state but a delay ...

Moreover you only initialize it in one specific case and leave it otherwise
uninitialized....

At the very least, init it to 0 by default as to not leave a dangling
"return argument" like that. However, I still have a problem with it:

> +	case OPAL_EEH_STOPPED_TEMP_UNAVAIL:
> +		result |= EEH_STATE_UNAVAILABLE;
> +		if (state)
> +			*state = 1000;
> +		break;

This is the *only* case where we return anything here. Why do we bother
then and not have the upper layer simply wait one second whenever it gets
a temp unavailable result (btw, you didn't differenciate temp unavailable
from permanently unavailable in your API).

This has impacts on patch 18/27 which I'll cover here:

> +/**
> + * powernv_eeh_set_option - Initialize EEH or MMIO/DMA reenable
> + * @pe: EEH PE
> + * @option: operation to be issued
> + *
> + * The function is used to control the EEH functionality globally.
> + * Currently, following options are support according to PAPR:
> + * Enable EEH, Disable EEH, Enable MMIO and Enable DMA
> + */
> +static int powernv_eeh_set_option(struct eeh_pe *pe, int option)
> +{
> +	struct pci_controller *hose = pe->phb;
> +	struct pnv_phb *phb = hose->private_data;
> +	int ret = -EEXIST;
> +
> +	/*
> +	 * What we need do is pass it down for hardware
> +	 * implementation to handle it.
> +	 */
> +	if (phb->eeh_ops && phb->eeh_ops->set_option)
> +		ret = phb->eeh_ops->set_option(pe, option);
> +
> +	return ret;
> +}

Should we implement something here ? IE. Should we look into
disabling freezing in the PHB via the firmware ? Or we just don't care ?

> +/**
> + * powernv_eeh_get_pe_addr - Retrieve PE address
> + * @pe: EEH PE
> + *
> + * Retrieve the PE address according to the given tranditional
> + * PCI BDF (Bus/Device/Function) address.
> + */
> +static int powernv_eeh_get_pe_addr(struct eeh_pe *pe)
> +{
> +	return pe->addr;
> +}
>
> +/**
> + * powernv_eeh_get_state - Retrieve PE state
> + * @pe: EEH PE
> + * @state: return value
> + *
> + * Retrieve the state of the specified PE. For IODA-compitable
> + * platform, it should be retrieved from IODA table. Therefore,
> + * we prefer passing down to hardware implementation to handle
> + * it.
> + */
> +static int powernv_eeh_get_state(struct eeh_pe *pe, int *state)
> +{
> +	struct pci_controller *hose = pe->phb;
> +	struct pnv_phb *phb = hose->private_data;
> +	int ret = EEH_STATE_NOT_SUPPORT;
> +
> +	if (phb->eeh_ops && phb->eeh_ops->get_state)
> +		ret = phb->eeh_ops->get_state(pe, state);
> +
> +	return ret;
> +}

Same comments about "state" which is really "delay" and is probably
not necessary at all ...

> +/**
> + * powernv_eeh_reset - Reset the specified PE
> + * @pe: EEH PE
> + * @option: reset option
> + *
> + * Reset the specified PE
> + */
> +static int powernv_eeh_reset(struct eeh_pe *pe, int option)
> +{
> +	struct pci_controller *hose = pe->phb;
> +	struct pnv_phb *phb = hose->private_data;
> +	int ret = -EEXIST;
> +
> +	if (phb->eeh_ops && phb->eeh_ops->reset)
> +		ret = phb->eeh_ops->reset(pe, option);
> +
> +	return ret;
> +}
> +
> +/**
> + * powernv_eeh_wait_state - Wait for PE state
> + * @pe: EEH PE
> + * @max_wait: maximal period in microsecond
> + *
> + * Wait for the state of associated PE. It might take some time
> + * to retrieve the PE's state.
> + */
> +static int powernv_eeh_wait_state(struct eeh_pe *pe, int max_wait)
> +{
> +	int ret;
> +	int mwait;
> +
> +	while (1) {
> +		ret = powernv_eeh_get_state(pe, &mwait);
> +
> +		/*
> +		 * If the PE's state is temporarily unavailable,
> +		 * we have to wait for the specified time. Otherwise,
> +		 * the PE's state will be returned immediately.
> +		 */
> +		if (ret != EEH_STATE_UNAVAILABLE)
> +			return ret;

So here we do a compare, while ret is actually a bit mask ...

In fact, ret should be named state_mask or something like that for clarity
and you should do a bit test here. Also do you want to diffenciate
permanent unavailability from temp. unavailability ?

> +		max_wait -= mwait;

You decrement max_wait but never test it or use it. You probably mean to

  - Limit mwait to max_wait
  - If mwait is 0, return

> +		msleep(mwait);
> +	}
> +
> +	return EEH_STATE_NOT_SUPPORT;
> +}
> +
> +/**
> + * powernv_eeh_get_log - Retrieve error log
> + * @pe: EEH PE
> + * @severity: temporary or permanent error log
> + * @drv_log: driver log to be combined with retrieved error log
> + * @len: length of driver log
> + *
> + * Retrieve the temporary or permanent error from the PE.
> + */
> +static int powernv_eeh_get_log(struct eeh_pe *pe, int severity,
> +			char *drv_log, unsigned long len)
> +{
> +	struct pci_controller *hose = pe->phb;
> +	struct pnv_phb *phb = hose->private_data;
> +	int ret = -EEXIST;
> +
> +	if (phb->eeh_ops && phb->eeh_ops->get_log)
> +		ret = phb->eeh_ops->get_log(pe, severity, drv_log, len);
> +
> +	return ret;
> +}
> +
> +/**
> + * powernv_eeh_configure_bridge - Configure PCI bridges in the indicated PE
> + * @pe: EEH PE
> + *
> + * The function will be called to reconfigure the bridges included
> + * in the specified PE so that the mulfunctional PE would be recovered
> + * again.
> + */
> +static int powernv_eeh_configure_bridge(struct eeh_pe *pe)
> +{
> +	struct pci_controller *hose = pe->phb;
> +	struct pnv_phb *phb = hose->private_data;
> +	int ret = 0;
> +
> +	if (phb->eeh_ops && phb->eeh_ops->configure_bridge)
> +		ret = phb->eeh_ops->configure_bridge(pe);
> +
> +	return ret;
> +}


Ben.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 17/27] powerpc/eeh: I/O chip PE log and bridge setup
  2013-06-05  7:34 ` [PATCH 17/27] powerpc/eeh: I/O chip PE log and bridge setup Gavin Shan
@ 2013-06-11  7:37   ` Benjamin Herrenschmidt
  2013-06-12  3:33     ` Gavin Shan
  0 siblings, 1 reply; 43+ messages in thread
From: Benjamin Herrenschmidt @ 2013-06-11  7:37 UTC (permalink / raw)
  To: Gavin Shan; +Cc: linuxppc-dev

On Wed, 2013-06-05 at 15:34 +0800, Gavin Shan wrote:
> The patch adds backends to retrieve error log and configure p2p
> bridges for the indicated PE.
> 
> Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
> ---

> +/**
> + * ioda_eeh_configure_bridge - Configure the PCI bridges for the indicated PE
> + * @pe: EEH PE
> + *
> + * For particular PE, it might have included PCI bridges. In order
> + * to make the PE work properly, those PCI bridges should be configured
> + * correctly. However, we need do nothing on P7IOC since the reset
> + * function will do everything that should be covered by the function.
> + */
> +static int ioda_eeh_configure_bridge(struct eeh_pe *pe)
> +{
> +	return 0;

Does it now ?

IE. Who reconfigures the windows and other config space bits of P2P
bridges ? Or is this handled elsewhere in Linux or in the upper levels
of EEH ? Or is that only needed for the PHB ?

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v3 00/27] EEH Support for PowerNV platform
  2013-06-05  7:34 [PATCH v3 00/27] EEH Support for PowerNV platform Gavin Shan
                   ` (26 preceding siblings ...)
  2013-06-05  7:34 ` [PATCH 27/27] powerpc/eeh: Debugfs for error injection Gavin Shan
@ 2013-06-11  7:46 ` Benjamin Herrenschmidt
  2013-06-12  3:18   ` Gavin Shan
  27 siblings, 1 reply; 43+ messages in thread
From: Benjamin Herrenschmidt @ 2013-06-11  7:46 UTC (permalink / raw)
  To: Gavin Shan; +Cc: linuxppc-dev

On Wed, 2013-06-05 at 15:34 +0800, Gavin Shan wrote:
> Initially, the series of patches is built based on 3.10.RC1 and the patchset
> doesn't intend to enable EEH functionality for PHB3 for now. Obviously, PHB3
> EEH support on PowerNV platform is something to do in future.

One thing missing here is a first patch that moves the eeh core out of
platform/pseries or things will simply not build if CONFIG_PPC_PSERIES
isn't enabled :-)

Move the whole lot to arch/powerpc/kernel

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 21/27] powerpc/eeh: Process interrupts caused by EEH
  2013-06-05  7:34 ` [PATCH 21/27] powerpc/eeh: Process interrupts caused by EEH Gavin Shan
@ 2013-06-11  8:13   ` Benjamin Herrenschmidt
  2013-06-13  4:14     ` Gavin Shan
  0 siblings, 1 reply; 43+ messages in thread
From: Benjamin Herrenschmidt @ 2013-06-11  8:13 UTC (permalink / raw)
  To: Gavin Shan; +Cc: linuxppc-dev

On Wed, 2013-06-05 at 15:34 +0800, Gavin Shan wrote:

> diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
> index d1fd5d4..68ac408 100644
> --- a/arch/powerpc/include/asm/eeh.h
> +++ b/arch/powerpc/include/asm/eeh.h
> @@ -209,6 +209,12 @@ void eeh_add_device_tree_late(struct pci_bus *);
>  void eeh_add_sysfs_files(struct pci_bus *);
>  void eeh_remove_bus_device(struct pci_dev *, int);
>  
> +#ifdef CONFIG_PPC_POWERNV
> +void pci_err_release(void);
> +#else
> +static inline void pci_err_release(void) { }
> +#endif

That business of the EEH core calling back into the powernv code
directly is gross. We don't do that...

See below for a discussion...

.../...

> +static void pci_err_take(void)
> +{
> +	down(&pci_err_seq_sem);
> +}
> +
> +/**
> + * pci_err_release - Enable error report for sending events
> + *
> + * We're hanlding the EEH event one by one. Each time, there only has
> + * one EEH event caused by error IRQ. The function is called to enable
> + * error report in order to send more EEH events.
> + */
> +void pci_err_release(void)
> +{
> +	up(&pci_err_seq_sem);
> +}

So it's generally bad to keep a semaphore held like that for a long
time, taken in one corner of the kernel and released in another.

I think you need to do something else. I'm not 100% certain what but
that doesn't seem right to me.

Also you have two problems I see here:

 - A given error will come potentially as both an interrupt and
a return of ffff's from MMIO. You don't know which one will get it first
and you end up going through two fairly different code path maybe. IE.
What happens if interrupts are off for a while on the CPU that is
targetted by the PHB interrupt and you "detect" a PHB fence as a result
of an MMIO on another CPU ? Will the normal EEH process clear the fence
and your interrupt completely miss logging any of those fancy messages
you added to this file ?

 - You create another kthread ... we already have one in eeh_event.c,
why another ?

I think you need to rethink that part. My idea is that the EEH
interrupts coming from the OPAL notifier would cause you to queue up
EEH events just like the current ones.

IE. Everything (including get_next_error) should be done by the one EEH
thread. This also avoids the needs for those extra semaphores.

One option is to create an event without a PE pointer at all. When
eeh_event_handler() gets that, it would iterate a new hook,
eeh_ops->next_error() which returns the PE.

That way you can do your printing for fences etc... and return the
top-level PE for anything PHB-wide. You may also want to add a flag
maybe to return non-recoverable events and essentially make EEH just
remove the offending devices from the system instead of panic'ing (panic
is never a good idea, for all I know, the dead PHB or dead IOC wasn't
critical to the system operating normally and you may have killed my
ability to even recover the logs by panic'ing).

Think a bit about it. I know the RTAS model is fairly different than our
model here, but I like the idea that on powernv, even if we detect an
MMIO freeze, we don't directly tell the EEH core to process *that* PE
but instead do the whole next_error thing as well. If the freeze was the
result of a fence, there's no point trying to process that specific PE.

Something like a fence would thus look like that:

 - [ Case 1 -> fence interrupt -> queue eeh_event with no PE ]
   [ Case 2 -> MMIO freeze detected -> queue eeh event with no PE ]

 - eeh_event_handler() sees no PE, loops around eeh_ops->get_next_error,
since we are single threaded in the EEH thread, it's ok for the IODA
backend to "cache" the current error data so that subsequent calls into
the backend know what we are doing.

 - get_next_error sees the fence, returns the top-level PE and starts
the reset (don't wait)

 - eeh_event_handler() calls the drivers for all devices on that PE
(including children) to notify them something's wrong (TODO: Add passing
by the upper level that this is a fatal error and don't attempt to
recover).

 - It then calls wait_state() which knows it's waiting on a fence, and
do the appropriate waiting etc...

 - Back to normal process...

Don't you think that might be cleaner ? Or do you see a gaping hole in
my description ?

> +static void pci_err_hub_diag_common(struct OpalIoP7IOCErrorData *data)
> +{
> +	/* GEM */
> +	pr_info("  GEM XFIR:        %016llx\n", data->gemXfir);
> +	pr_info("  GEM RFIR:        %016llx\n", data->gemRfir);
> +	pr_info("  GEM RIRQFIR:     %016llx\n", data->gemRirqfir);
> +	pr_info("  GEM Mask:        %016llx\n", data->gemMask);
> +	pr_info("  GEM RWOF:        %016llx\n", data->gemRwof);
> +
> +	/* LEM */
> +	pr_info("  LEM FIR:         %016llx\n", data->lemFir);
> +	pr_info("  LEM Error Mask:  %016llx\n", data->lemErrMask);
> +	pr_info("  LEM Action 0:    %016llx\n", data->lemAction0);
> +	pr_info("  LEM Action 1:    %016llx\n", data->lemAction1);
> +	pr_info("  LEM WOF:         %016llx\n", data->lemWof);
> +}

That's stuff is P7IOC specific. Make sure you make it clear in the
function name and that you check the diag data "type". IE. Use a new
diag_data2 function that returns a type. We can obsolete the old one.

> +static void pci_err_hub_diag_data(struct pci_controller *hose)
> +{
> +	struct pnv_phb *phb = hose->private_data;
> +	struct OpalIoP7IOCErrorData *data;
> +	long ret;
> +
> +	data = (struct OpalIoP7IOCErrorData *)pci_err_diag;
> +	ret = opal_pci_get_hub_diag_data(phb->hub_id, data, PAGE_SIZE);
> +	if (ret != OPAL_SUCCESS) {
> +		pr_warning("%s: Failed to get HUB#%llx diag-data, ret=%ld\n",
> +			__func__, phb->hub_id, ret);
> +		return;
> +	}
> +
> +	/* Check the error type */
> +	if (data->type <= OPAL_P7IOC_DIAG_TYPE_NONE ||
> +	    data->type >= OPAL_P7IOC_DIAG_TYPE_LAST) {
> +		pr_warning("%s: Invalid type of HUB#%llx diag-data (%d)\n",
> +			__func__, phb->hub_id, data->type);
> +		return;
> +	}
> +
> +	switch (data->type) {
> +	case OPAL_P7IOC_DIAG_TYPE_RGC:
> +		pr_info("P7IOC diag-data for RGC\n\n");
> +		pci_err_hub_diag_common(data);
> +		pr_info("  RGC Status:      %016llx\n", data->rgc.rgcStatus);
> +		pr_info("  RGC LDCP:        %016llx\n", data->rgc.rgcLdcp);
> +		break;
> +	case OPAL_P7IOC_DIAG_TYPE_BI:
> +		pr_info("P7IOC diag-data for BI %s\n\n",
> +			data->bi.biDownbound ? "Downbound" : "Upbound");
> +		pci_err_hub_diag_common(data);
> +		pr_info("  BI LDCP 0:       %016llx\n", data->bi.biLdcp0);
> +		pr_info("  BI LDCP 1:       %016llx\n", data->bi.biLdcp1);
> +		pr_info("  BI LDCP 2:       %016llx\n", data->bi.biLdcp2);
> +		pr_info("  BI Fence Status: %016llx\n", data->bi.biFenceStatus);
> +		break;
> +	case OPAL_P7IOC_DIAG_TYPE_CI:
> +		pr_info("P7IOC diag-data for CI Port %d\\nn",
> +			data->ci.ciPort);
> +		pci_err_hub_diag_common(data);
> +		pr_info("  CI Port Status:  %016llx\n", data->ci.ciPortStatus);
> +		pr_info("  CI Port LDCP:    %016llx\n", data->ci.ciPortLdcp);
> +		break;
> +	case OPAL_P7IOC_DIAG_TYPE_MISC:
> +		pr_info("P7IOC diag-data for MISC\n\n");
> +		pci_err_hub_diag_common(data);
> +		break;
> +	case OPAL_P7IOC_DIAG_TYPE_I2C:
> +		pr_info("P7IOC diag-data for I2C\n\n");
> +		pci_err_hub_diag_common(data);
> +		break;
> +	}
> +}
> +
> +static void pci_err_phb_diag_data(struct pci_controller *hose)
> +{
> +	struct pnv_phb *phb = hose->private_data;
> +	struct OpalIoP7IOCPhbErrorData *data;
> +	int i;
> +	long ret;
> +
> +	data = (struct OpalIoP7IOCPhbErrorData *)pci_err_diag;
> +	ret = opal_pci_get_phb_diag_data2(phb->opal_id, data, PAGE_SIZE);
> +	if (ret != OPAL_SUCCESS) {
> +		pr_warning("%s: Failed to get diag-data for PHB#%x, ret=%ld\n",
> +			__func__, hose->global_number, ret);
> +		return;
> +	}
> +
> +	pr_info("PHB#%x Diag-data\n\n", hose->global_number);
> +	pr_info("  brdgCtl:              %08x\n", data->brdgCtl);
> +
> +	pr_info("  portStatusReg:        %08x\n", data->portStatusReg);
> +	pr_info("  rootCmplxStatus:      %08x\n", data->rootCmplxStatus);
> +	pr_info("  busAgentStatus:       %08x\n", data->busAgentStatus);
> +
> +	pr_info("  deviceStatus:         %08x\n", data->deviceStatus);
> +	pr_info("  slotStatus:           %08x\n", data->slotStatus);
> +	pr_info("  linkStatus:           %08x\n", data->linkStatus);
> +	pr_info("  devCmdStatus:         %08x\n", data->devCmdStatus);
> +	pr_info("  devSecStatus:         %08x\n", data->devSecStatus);
> +
> +	pr_info("  rootErrorStatus:      %08x\n", data->rootErrorStatus);
> +	pr_info("  uncorrErrorStatus:    %08x\n", data->uncorrErrorStatus);
> +	pr_info("  corrErrorStatus:      %08x\n", data->corrErrorStatus);
> +	pr_info("  tlpHdr1:              %08x\n", data->tlpHdr1);
> +	pr_info("  tlpHdr2:              %08x\n", data->tlpHdr2);
> +	pr_info("  tlpHdr3:              %08x\n", data->tlpHdr3);
> +	pr_info("  tlpHdr4:              %08x\n", data->tlpHdr4);
> +	pr_info("  sourceId:             %08x\n", data->sourceId);
> +
> +	pr_info("  errorClass:           %016llx\n", data->errorClass);
> +	pr_info("  correlator:           %016llx\n", data->correlator);
> +	pr_info("  p7iocPlssr:           %016llx\n", data->p7iocPlssr);
> +	pr_info("  p7iocCsr:             %016llx\n", data->p7iocCsr);
> +	pr_info("  lemFir:               %016llx\n", data->lemFir);
> +	pr_info("  lemErrorMask:         %016llx\n", data->lemErrorMask);
> +	pr_info("  lemWOF:               %016llx\n", data->lemWOF);
> +	pr_info("  phbErrorStatus:       %016llx\n", data->phbErrorStatus);
> +	pr_info("  phbFirstErrorStatus:  %016llx\n", data->phbFirstErrorStatus);
> +	pr_info("  phbErrorLog0:         %016llx\n", data->phbErrorLog0);
> +	pr_info("  phbErrorLog1:         %016llx\n", data->phbErrorLog1);
> +	pr_info("  mmioErrorStatus:      %016llx\n", data->mmioErrorStatus);
> +	pr_info("  mmioFirstErrorStatus: %016llx\n", data->mmioFirstErrorStatus);
> +	pr_info("  mmioErrorLog0:        %016llx\n", data->mmioErrorLog0);
> +	pr_info("  mmioErrorLog1:        %016llx\n", data->mmioErrorLog1);
> +	pr_info("  dma0ErrorStatus:      %016llx\n", data->dma0ErrorStatus);
> +	pr_info("  dma0FirstErrorStatus: %016llx\n", data->dma0FirstErrorStatus);
> +	pr_info("  dma0ErrorLog0:        %016llx\n", data->dma0ErrorLog0);
> +	pr_info("  dma0ErrorLog1:        %016llx\n", data->dma0ErrorLog1);
> +	pr_info("  dma1ErrorStatus:      %016llx\n", data->dma1ErrorStatus);
> +	pr_info("  dma1FirstErrorStatus: %016llx\n", data->dma1FirstErrorStatus);
> +	pr_info("  dma1ErrorLog0:        %016llx\n", data->dma1ErrorLog0);
> +	pr_info("  dma1ErrorLog1:        %016llx\n", data->dma1ErrorLog1);
> +
> +	for (i = 0; i < OPAL_P7IOC_NUM_PEST_REGS; i++) {
> +		if ((data->pestA[i] >> 63) == 0 &&
> +		    (data->pestB[i] >> 63) == 0)
> +			continue;
> +
> +		pr_info("  PE[%3d] PESTA:        %016llx\n", i, data->pestA[i]);
> +		pr_info("          PESTB:        %016llx\n", data->pestB[i]);
> +	}
> +}
> +
> +/*
> + * Process PCI errors from IOC, PHB, or PE. Here's the list
> + * of expected error types and their severities, as well as
> + * the corresponding action.
> + *
> + * Type                        Severity                Action
> + * OPAL_EEH_ERROR_IOC  OPAL_EEH_SEV_IOC_DEAD   panic
> + * OPAL_EEH_ERROR_IOC  OPAL_EEH_SEV_INF        diag_data
> + * OPAL_EEH_ERROR_PHB  OPAL_EEH_SEV_PHB_DEAD   panic
> + * OPAL_EEH_ERROR_PHB  OPAL_EEH_SEV_PHB_FENCED eeh
> + * OPAL_EEH_ERROR_PHB  OPAL_EEH_SEV_INF        diag_data
> + * OPAL_EEH_ERROR_PE   OPAL_EEH_SEV_PE_ER      eeh
> + */
> +static void pci_err_process(struct pci_controller *hose,
> +			u16 err_type, u16 severity, u16 pe_no)
> +{
> +	PCI_ERR_DBG("PCI_ERR: Process error (%d, %d, %d) on PHB#%x\n",
> +		err_type, severity, pe_no, hose->global_number);
> +
> +	switch (err_type) {
> +	case OPAL_EEH_IOC_ERROR:
> +		if (severity == OPAL_EEH_SEV_IOC_DEAD)
> +			panic("Dead IOC of PHB#%x", hose->global_number);
> +		else if (severity == OPAL_EEH_SEV_INF) {
> +			pci_err_hub_diag_data(hose);
> +			pci_err_release();
> +		}
> +
> +		break;
> +	case OPAL_EEH_PHB_ERROR:
> +		if (severity == OPAL_EEH_SEV_PHB_DEAD)
> +			panic("Dead PHB#%x", hose->global_number);
> +		else if (severity == OPAL_EEH_SEV_PHB_FENCED)
> +			pci_err_check_phb(hose);
> +		else if (severity == OPAL_EEH_SEV_INF) {
> +			pci_err_phb_diag_data(hose);
> +			pci_err_release();
> +		}
> +
> +		break;
> +	case OPAL_EEH_PE_ERROR:
> +		pci_err_check_pe(hose, pe_no);
> +		break;
> +	}
> +}
> +
> +static int pci_err_handler(void *dummy)
> +{
> +	struct pnv_phb *phb;
> +	struct pci_controller *hose, *tmp;
> +	u64 frozen_pe_no;
> +	u16 err_type, severity;
> +	long ret;
> +
> +	while (!kthread_should_stop()) {
> +		down(&pci_err_int_sem);
> +		PCI_ERR_DBG("PCI_ERR: Get PCI error semaphore\n");
> +
> +		list_for_each_entry_safe(hose, tmp, &hose_list, list_node) {
> +			phb = hose->private_data;
> +restart:
> +			pci_err_take();
> +			ret = opal_pci_next_error(phb->opal_id,
> +					&frozen_pe_no, &err_type, &severity);
> +
> +			/* If OPAL API returns error, we needn't proceed */
> +			if (ret != OPAL_SUCCESS) {
> +				PCI_ERR_DBG("PCI_ERR: Invalid return value on "
> +					    "PHB#%x (0x%lx) from opal_pci_next_error",
> +					    hose->global_number, ret);
> +				pci_err_release();
> +				continue;
> +			}
> +
> +			/* If the PHB doesn't have error, stop processing */
> +			if (err_type == OPAL_EEH_NO_ERROR ||
> +			    severity == OPAL_EEH_SEV_NO_ERROR) {
> +				PCI_ERR_DBG("PCI_ERR: No error found on PHB#%x\n",
> +					hose->global_number);
> +				pci_err_release();
> +				continue;
> +			}
> +
> +			/*
> +			 * Process the error until there're no pending
> +			 * errors on the specific PHB.
> +			 */
> +			pci_err_process(hose, err_type, severity, frozen_pe_no);
> +			goto restart;
> +		}
> +	}
> +
> +	return 0;
> +}
> +
> +/*
> + * pci_err_init - Initialize PCI error handling component
> + *
> + * It should be done before OPAL interrupts got registered because
> + * that depends on this.
> + */
> +static int __init pci_err_init(void)
> +{
> +	int ret = 0;
> +
> +	if (!firmware_has_feature(FW_FEATURE_OPALv3)) {
> +		pr_err("%s: FW_FEATURE_OPALv3 required!\n",
> +			__func__);
> +		return -EINVAL;
> +	}
> +
> +	pci_err_diag = (char *)__get_free_page(GFP_KERNEL|__GFP_ZERO);
> +	if (!pci_err_diag) {
> +		pr_err("%s: Failed to alloc memory for diag data\n",
> +			__func__);
> +		return -ENOMEM;
> +	}
> +
> +	/* Initialize semaphore */
> +	sema_init(&pci_err_int_sem, 0);
> +	sema_init(&pci_err_seq_sem, 1);
> +
> +	/* Start kthread */
> +	pci_err_thread = kthread_run(pci_err_handler, NULL, "PCI_ERR");
> +	if (IS_ERR(pci_err_thread)) {
> +		ret = PTR_ERR(pci_err_thread);
> +		pr_err("%s: Failed to start kthread, ret=%d\n",
> +			__func__, ret);
> +	}
> +
> +	free_page((unsigned long)pci_err_diag);
> +	return ret;
> +}
> +
> +arch_initcall(pci_err_init);
> diff --git a/arch/powerpc/platforms/pseries/eeh_event.c b/arch/powerpc/platforms/pseries/eeh_event.c
> index 1f86b80..e4c636e 100644
> --- a/arch/powerpc/platforms/pseries/eeh_event.c
> +++ b/arch/powerpc/platforms/pseries/eeh_event.c
> @@ -84,6 +84,14 @@ static int eeh_event_handler(void * dummy)
>  	eeh_handle_event(pe);
>  	eeh_pe_state_clear(pe, EEH_PE_RECOVERING);
>  
> +	/*
> +	 * If it's the event caused by error reporting IRQ,
> +	 * we need release the module so that precedent events
> +	 * could be fired.
> +	 */
> +	if (event->flag & EEH_EVENT_INT)
> +		pci_err_release();
> +
>  	kfree(event);
>  	mutex_unlock(&eeh_event_mutex);
>  

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 23/27] powernv/opal: Notifier for OPAL events
  2013-06-05  7:34 ` [PATCH 23/27] powernv/opal: Notifier for OPAL events Gavin Shan
@ 2013-06-12  0:32   ` Benjamin Herrenschmidt
  2013-06-12  3:15     ` Gavin Shan
  0 siblings, 1 reply; 43+ messages in thread
From: Benjamin Herrenschmidt @ 2013-06-12  0:32 UTC (permalink / raw)
  To: Gavin Shan; +Cc: linuxppc-dev

On Wed, 2013-06-05 at 15:34 +0800, Gavin Shan wrote:
> The patch intends to implement the notifier for variable OPAL events.
> It's notable that the notifier can be disabled dynamically. Also, the
> notifier could be fired upon incoming OPAL interrupts, or enabling
> the OPAL notifier.

"This patch implements a notifier to receive a notification on OPAL
event mask changes." is probably better. No need to blurb about
enable/disable, however add something along the lines of

"The notifier is only called as a result of an OPAL interrupt, which
will happen upon reception of FSP messages or PCI errors. Any event
mask change detected as a result of opal_poll_events() will not result
in a notifier call.

With OPALv3, opal_poll_event() will not clear interrupt conditions from
the FSP however, even if it consumes the messages (and thus updates the
event mask). Thus the interrupt notifier is a reliable way to get
the completion for FSP based OPAL operations. The specific list will
be added to the header file.


> Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
> ---
>  arch/powerpc/include/asm/opal.h       |    3 +
>  arch/powerpc/platforms/powernv/opal.c |   79 ++++++++++++++++++++++++++++++++-
>  2 files changed, 81 insertions(+), 1 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
> index 2880797..64e7c84 100644
> --- a/arch/powerpc/include/asm/opal.h
> +++ b/arch/powerpc/include/asm/opal.h
> @@ -644,6 +644,9 @@ extern void hvc_opal_init_early(void);
>  extern int early_init_dt_scan_opal(unsigned long node, const char *uname,
>  				   int depth, void *data);
>  
> +extern int opal_notifier_register(uint64_t mask, void (*cb)(uint64_t));
> +extern void opal_notifier_enable(bool enable);

Make it two functions

opal_enable_notifier() vs. opal_disable_notifier()

>  extern int opal_get_chars(uint32_t vtermno, char *buf, int count);
>  extern int opal_put_chars(uint32_t vtermno, const char *buf, int total_len);
>  
> diff --git a/arch/powerpc/platforms/powernv/opal.c b/arch/powerpc/platforms/powernv/opal.c
> index 628c564..9bbbf93 100644
> --- a/arch/powerpc/platforms/powernv/opal.c
> +++ b/arch/powerpc/platforms/powernv/opal.c
> @@ -26,11 +26,20 @@ struct opal {
>  	u64 entry;
>  } opal;
>  
> +struct opal_cb {
> +	struct list_head list;
> +	uint64_t mask;
> +	void (*cb)(uint64_t);
> +};
> +
>  static struct device_node *opal_node;
>  static DEFINE_SPINLOCK(opal_write_lock);
>  extern u64 opal_mc_secondary_handler[];
>  static unsigned int *opal_irqs;
>  static unsigned int opal_irq_count;
> +static LIST_HEAD(opal_notifier);
> +static DEFINE_SPINLOCK(opal_notifier_lock);
> +static atomic_t opal_notifier_hold = ATOMIC_INIT(0);
>  
>  int __init early_init_dt_scan_opal(unsigned long node,
>  				   const char *uname, int depth, void *data)
> @@ -95,6 +104,74 @@ static int __init opal_register_exception_handlers(void)
>  
>  early_initcall(opal_register_exception_handlers);
>  
> +int opal_notifier_register(uint64_t mask, void (*cb)(uint64_t))
> +{
> +	unsigned long flags;
> +	struct opal_cb *p, *tmp;
> +
> +	if (!mask || !cb) {
> +		pr_warning("%s: Invalid argument (%llx, %p)!\n",
> +			__func__, mask, cb);
> +		return -EINVAL;
> +	}
> +
> +	p = kzalloc(sizeof(*p), GFP_KERNEL);
> +	if (!p) {
> +		pr_warning("%s: Out of memory (%llx, %p)!\n",
> +			__func__, mask, cb);
> +		return -ENOMEM;
> +	}
> +	p->mask = mask;
> +	p->cb   = cb;
> +
> +	spin_lock_irqsave(&opal_notifier_lock, flags);
> +	list_for_each_entry(tmp, &opal_notifier, list) {
> +		if (tmp->cb == cb || tmp->mask & mask) {
> +			pr_warning("%s: Duplicate evnet handler (%llx, %p)\n",
> +				__func__, tmp->mask, tmp->cb);
> +			spin_unlock_irqrestore(&opal_notifier_lock, flags);
> +			kfree(p);
> +			return -EEXIST;
> +		}
> +	}

Don't bother with checking the list already. This is not useful. Also
it's fine for two things to listen on the same event.

> +
> +	list_add_tail(&p->list, &opal_notifier);
> +	spin_unlock_irqrestore(&opal_notifier_lock, flags);
> +
> +	return 0;
> +}
> +
> +static void opal_do_notifier(uint64_t events)
> +{
> +	struct opal_cb *tmp;
> +
> +	if (atomic_read(&opal_notifier_hold))
> +		return;
> +	if (!events)
> +		return;
> +
> +	list_for_each_entry(tmp, &opal_notifier, list) {
> +		if (events & tmp->mask)
> +			tmp->cb(events & tmp->mask);
> +	}
> +}

My idea was to call this if the event bit has changed since the last
time we called opal_do_notifier. IE. Use a static last_notified_mask
and do something like

	changed_mask = last_notified_mask ^ events;

	list_for_each_entry(tmp, &opal_notifier, list) {
		if (changed_mask & tmp->mask)
			tmp->cb(events);

Also, always pass the whole events to the callback, no point in
filtering.

BTW, "tmp" isn't a nice name here.

> +void opal_notifier_enable(bool enable)
> +{
> +	int64_t rc;
> +	uint64_t evt = 0;
> +
> +	if (enable) {
> +		atomic_set(&opal_notifier_hold, 0);
> +
> +		/* Process pending events */
> +		rc = opal_poll_events(&evt);
> +		if (rc == OPAL_SUCCESS && evt)
> +			opal_do_notifier(evt);
> +	} else
> +		atomic_set(&opal_notifier_hold, 1);
> +}

As I said, two functions.

>  int opal_get_chars(uint32_t vtermno, char *buf, int count)
>  {
>  	s64 len, rc;
> @@ -297,7 +374,7 @@ static irqreturn_t opal_interrupt(int irq, void *data)
>  
>  	opal_handle_interrupt(virq_to_hw(irq), &events);
>  
> -	/* XXX TODO: Do something with the events */
> +	opal_do_notifier(events);
>  
>  	return IRQ_HANDLED;
>  }

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 23/27] powernv/opal: Notifier for OPAL events
  2013-06-12  0:32   ` Benjamin Herrenschmidt
@ 2013-06-12  3:15     ` Gavin Shan
  0 siblings, 0 replies; 43+ messages in thread
From: Gavin Shan @ 2013-06-12  3:15 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: linuxppc-dev, Gavin Shan

On Wed, Jun 12, 2013 at 10:32:29AM +1000, Benjamin Herrenschmidt wrote:
>On Wed, 2013-06-05 at 15:34 +0800, Gavin Shan wrote:
>> The patch intends to implement the notifier for variable OPAL events.
>> It's notable that the notifier can be disabled dynamically. Also, the
>> notifier could be fired upon incoming OPAL interrupts, or enabling
>> the OPAL notifier.
>
>"This patch implements a notifier to receive a notification on OPAL
>event mask changes." is probably better. No need to blurb about
>enable/disable, however add something along the lines of
>
>"The notifier is only called as a result of an OPAL interrupt, which
>will happen upon reception of FSP messages or PCI errors. Any event
>mask change detected as a result of opal_poll_events() will not result
>in a notifier call.
>
>With OPALv3, opal_poll_event() will not clear interrupt conditions from
>the FSP however, even if it consumes the messages (and thus updates the
>event mask). Thus the interrupt notifier is a reliable way to get
>the completion for FSP based OPAL operations. The specific list will
>be added to the header file.
>
>

Thanks, Ben. Will update the changelog accordingly.

>> Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
>> ---
>>  arch/powerpc/include/asm/opal.h       |    3 +
>>  arch/powerpc/platforms/powernv/opal.c |   79 ++++++++++++++++++++++++++++++++-
>>  2 files changed, 81 insertions(+), 1 deletions(-)
>> 
>> diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
>> index 2880797..64e7c84 100644
>> --- a/arch/powerpc/include/asm/opal.h
>> +++ b/arch/powerpc/include/asm/opal.h
>> @@ -644,6 +644,9 @@ extern void hvc_opal_init_early(void);
>>  extern int early_init_dt_scan_opal(unsigned long node, const char *uname,
>>  				   int depth, void *data);
>>  
>> +extern int opal_notifier_register(uint64_t mask, void (*cb)(uint64_t));
>> +extern void opal_notifier_enable(bool enable);
>
>Make it two functions
>
>opal_enable_notifier() vs. opal_disable_notifier()
>

Ok. Will do.

>>  extern int opal_get_chars(uint32_t vtermno, char *buf, int count);
>>  extern int opal_put_chars(uint32_t vtermno, const char *buf, int total_len);
>>  
>> diff --git a/arch/powerpc/platforms/powernv/opal.c b/arch/powerpc/platforms/powernv/opal.c
>> index 628c564..9bbbf93 100644
>> --- a/arch/powerpc/platforms/powernv/opal.c
>> +++ b/arch/powerpc/platforms/powernv/opal.c
>> @@ -26,11 +26,20 @@ struct opal {
>>  	u64 entry;
>>  } opal;
>>  
>> +struct opal_cb {
>> +	struct list_head list;
>> +	uint64_t mask;
>> +	void (*cb)(uint64_t);
>> +};
>> +
>>  static struct device_node *opal_node;
>>  static DEFINE_SPINLOCK(opal_write_lock);
>>  extern u64 opal_mc_secondary_handler[];
>>  static unsigned int *opal_irqs;
>>  static unsigned int opal_irq_count;
>> +static LIST_HEAD(opal_notifier);
>> +static DEFINE_SPINLOCK(opal_notifier_lock);
>> +static atomic_t opal_notifier_hold = ATOMIC_INIT(0);
>>  
>>  int __init early_init_dt_scan_opal(unsigned long node,
>>  				   const char *uname, int depth, void *data)
>> @@ -95,6 +104,74 @@ static int __init opal_register_exception_handlers(void)
>>  
>>  early_initcall(opal_register_exception_handlers);
>>  
>> +int opal_notifier_register(uint64_t mask, void (*cb)(uint64_t))
>> +{
>> +	unsigned long flags;
>> +	struct opal_cb *p, *tmp;
>> +
>> +	if (!mask || !cb) {
>> +		pr_warning("%s: Invalid argument (%llx, %p)!\n",
>> +			__func__, mask, cb);
>> +		return -EINVAL;
>> +	}
>> +
>> +	p = kzalloc(sizeof(*p), GFP_KERNEL);
>> +	if (!p) {
>> +		pr_warning("%s: Out of memory (%llx, %p)!\n",
>> +			__func__, mask, cb);
>> +		return -ENOMEM;
>> +	}
>> +	p->mask = mask;
>> +	p->cb   = cb;
>> +
>> +	spin_lock_irqsave(&opal_notifier_lock, flags);
>> +	list_for_each_entry(tmp, &opal_notifier, list) {
>> +		if (tmp->cb == cb || tmp->mask & mask) {
>> +			pr_warning("%s: Duplicate evnet handler (%llx, %p)\n",
>> +				__func__, tmp->mask, tmp->cb);
>> +			spin_unlock_irqrestore(&opal_notifier_lock, flags);
>> +			kfree(p);
>> +			return -EEXIST;
>> +		}
>> +	}
>
>Don't bother with checking the list already. This is not useful. Also
>it's fine for two things to listen on the same event.
>

Ok. Will update in next revision.

>> +
>> +	list_add_tail(&p->list, &opal_notifier);
>> +	spin_unlock_irqrestore(&opal_notifier_lock, flags);
>> +
>> +	return 0;
>> +}
>> +
>> +static void opal_do_notifier(uint64_t events)
>> +{
>> +	struct opal_cb *tmp;
>> +
>> +	if (atomic_read(&opal_notifier_hold))
>> +		return;
>> +	if (!events)
>> +		return;
>> +
>> +	list_for_each_entry(tmp, &opal_notifier, list) {
>> +		if (events & tmp->mask)
>> +			tmp->cb(events & tmp->mask);
>> +	}
>> +}
>
>My idea was to call this if the event bit has changed since the last
>time we called opal_do_notifier. IE. Use a static last_notified_mask
>and do something like
>
>	changed_mask = last_notified_mask ^ events;
>
>	list_for_each_entry(tmp, &opal_notifier, list) {
>		if (changed_mask & tmp->mask)
>			tmp->cb(events);
>
>Also, always pass the whole events to the callback, no point in
>filtering.
>
>BTW, "tmp" isn't a nice name here.
>

Ok. Will update in next revision:
	- Allow multiple "clients" for same event.
	- Make the variable "tmp" to have better name.

>> +void opal_notifier_enable(bool enable)
>> +{
>> +	int64_t rc;
>> +	uint64_t evt = 0;
>> +
>> +	if (enable) {
>> +		atomic_set(&opal_notifier_hold, 0);
>> +
>> +		/* Process pending events */
>> +		rc = opal_poll_events(&evt);
>> +		if (rc == OPAL_SUCCESS && evt)
>> +			opal_do_notifier(evt);
>> +	} else
>> +		atomic_set(&opal_notifier_hold, 1);
>> +}
>
>As I said, two functions.
>

Ok.

>>  int opal_get_chars(uint32_t vtermno, char *buf, int count)
>>  {
>>  	s64 len, rc;
>> @@ -297,7 +374,7 @@ static irqreturn_t opal_interrupt(int irq, void *data)
>>  
>>  	opal_handle_interrupt(virq_to_hw(irq), &events);
>>  
>> -	/* XXX TODO: Do something with the events */
>> +	opal_do_notifier(events);
>>  
>>  	return IRQ_HANDLED;
>>  }
>

Thanks,
Gavin

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v3 00/27] EEH Support for PowerNV platform
  2013-06-11  7:46 ` [PATCH v3 00/27] EEH Support for PowerNV platform Benjamin Herrenschmidt
@ 2013-06-12  3:18   ` Gavin Shan
  0 siblings, 0 replies; 43+ messages in thread
From: Gavin Shan @ 2013-06-12  3:18 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: linuxppc-dev, Gavin Shan

On Tue, Jun 11, 2013 at 05:46:24PM +1000, Benjamin Herrenschmidt wrote:
>On Wed, 2013-06-05 at 15:34 +0800, Gavin Shan wrote:
>> Initially, the series of patches is built based on 3.10.RC1 and the patchset
>> doesn't intend to enable EEH functionality for PHB3 for now. Obviously, PHB3
>> EEH support on PowerNV platform is something to do in future.
>
>One thing missing here is a first patch that moves the eeh core out of
>platform/pseries or things will simply not build if CONFIG_PPC_PSERIES
>isn't enabled :-)
>
>Move the whole lot to arch/powerpc/kernel
>

Ok. Will make the first patch to do it in next revision :-)

Thanks,
Gavin

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 15/27] powerpc/eeh: I/O chip EEH state retrieval
  2013-06-11  7:37   ` Benjamin Herrenschmidt
@ 2013-06-12  3:32     ` Gavin Shan
  2013-06-12  4:19       ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 43+ messages in thread
From: Gavin Shan @ 2013-06-12  3:32 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: linuxppc-dev, Gavin Shan

On Tue, Jun 11, 2013 at 05:37:04PM +1000, Benjamin Herrenschmidt wrote:
>On Wed, 2013-06-05 at 15:34 +0800, Gavin Shan wrote:
>> The patch adds I/O chip backend to retrieve the state for the
>> indicated PE. While the PE state is temperarily unavailable,
>> we return the default wait time (1000ms).
>> 
>> Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
>> ---
>>  arch/powerpc/platforms/powernv/eeh-ioda.c |  102 ++++++++++++++++++++++++++++-
>>  1 files changed, 101 insertions(+), 1 deletions(-)
>> 
>> diff --git a/arch/powerpc/platforms/powernv/eeh-ioda.c b/arch/powerpc/platforms/powernv/eeh-ioda.c
>> index e24622e..3c72321 100644
>> --- a/arch/powerpc/platforms/powernv/eeh-ioda.c
>> +++ b/arch/powerpc/platforms/powernv/eeh-ioda.c
>> @@ -125,10 +125,110 @@ static int ioda_eeh_set_option(struct eeh_pe *pe, int option)
>>  	return ret;
>>  }
>>  
>> +/**
>> + * ioda_eeh_get_state - Retrieve the state of PE
>> + * @pe: EEH PE
>> + * @state: return value
>> + *
>> + * The PE's state should be retrieved from the PEEV, PEST
>> + * IODA tables. Since the OPAL has exported the function
>> + * to do it, it'd better to use that.
>> + */
>> +static int ioda_eeh_get_state(struct eeh_pe *pe, int *state)
>> +{
>
>So everywhere you have this "state" argument which isn't a state but a delay ...
>
>Moreover you only initialize it in one specific case and leave it otherwise
>uninitialized....
>
>At the very least, init it to 0 by default as to not leave a dangling
>"return argument" like that. However, I still have a problem with it:
>

Ok. I will update accordingly in upper layer (eeh-powernv.c)
	- Initialize it to value "0".
	- If necessary, return 1 second.

>> +	case OPAL_EEH_STOPPED_TEMP_UNAVAIL:
>> +		result |= EEH_STATE_UNAVAILABLE;
>> +		if (state)
>> +			*state = 1000;
>> +		break;
>
>This is the *only* case where we return anything here. Why do we bother
>then and not have the upper layer simply wait one second whenever it gets
>a temp unavailable result (btw, you didn't differenciate temp unavailable
>from permanently unavailable in your API).
>

We already defferentiated the permanent/temp availibility through the
return value from the function:
	- EEH_STATE_UNAVAILABLE: temporary unavailibility
	- EEH_STATE_NOT_SUPPORT: permanent unavailibility

The EEH core will handle the return value (from the function) accordingly.

>This has impacts on patch 18/27 which I'll cover here:
>
>> +/**
>> + * powernv_eeh_set_option - Initialize EEH or MMIO/DMA reenable
>> + * @pe: EEH PE
>> + * @option: operation to be issued
>> + *
>> + * The function is used to control the EEH functionality globally.
>> + * Currently, following options are support according to PAPR:
>> + * Enable EEH, Disable EEH, Enable MMIO and Enable DMA
>> + */
>> +static int powernv_eeh_set_option(struct eeh_pe *pe, int option)
>> +{
>> +	struct pci_controller *hose = pe->phb;
>> +	struct pnv_phb *phb = hose->private_data;
>> +	int ret = -EEXIST;
>> +
>> +	/*
>> +	 * What we need do is pass it down for hardware
>> +	 * implementation to handle it.
>> +	 */
>> +	if (phb->eeh_ops && phb->eeh_ops->set_option)
>> +		ret = phb->eeh_ops->set_option(pe, option);
>> +
>> +	return ret;
>> +}
>
>Should we implement something here ? IE. Should we look into
>disabling freezing in the PHB via the firmware ? Or we just don't care ?
>

We just don't care. If EEH functionality has been disabled, we shouldn't
run into the code.

>> +/**
>> + * powernv_eeh_get_pe_addr - Retrieve PE address
>> + * @pe: EEH PE
>> + *
>> + * Retrieve the PE address according to the given tranditional
>> + * PCI BDF (Bus/Device/Function) address.
>> + */
>> +static int powernv_eeh_get_pe_addr(struct eeh_pe *pe)
>> +{
>> +	return pe->addr;
>> +}
>>
>> +/**
>> + * powernv_eeh_get_state - Retrieve PE state
>> + * @pe: EEH PE
>> + * @state: return value
>> + *
>> + * Retrieve the state of the specified PE. For IODA-compitable
>> + * platform, it should be retrieved from IODA table. Therefore,
>> + * we prefer passing down to hardware implementation to handle
>> + * it.
>> + */
>> +static int powernv_eeh_get_state(struct eeh_pe *pe, int *state)
>> +{
>> +	struct pci_controller *hose = pe->phb;
>> +	struct pnv_phb *phb = hose->private_data;
>> +	int ret = EEH_STATE_NOT_SUPPORT;
>> +
>> +	if (phb->eeh_ops && phb->eeh_ops->get_state)
>> +		ret = phb->eeh_ops->get_state(pe, state);
>> +
>> +	return ret;
>> +}
>
>Same comments about "state" which is really "delay" and is probably
>not necessary at all ...
>

We need the "delay" in future to support PowerKVM guest. If the
specified PE is being reset, we rely on the delay to hold the
powerkvm guest for a while until the PE reset is done.

>> +/**
>> + * powernv_eeh_reset - Reset the specified PE
>> + * @pe: EEH PE
>> + * @option: reset option
>> + *
>> + * Reset the specified PE
>> + */
>> +static int powernv_eeh_reset(struct eeh_pe *pe, int option)
>> +{
>> +	struct pci_controller *hose = pe->phb;
>> +	struct pnv_phb *phb = hose->private_data;
>> +	int ret = -EEXIST;
>> +
>> +	if (phb->eeh_ops && phb->eeh_ops->reset)
>> +		ret = phb->eeh_ops->reset(pe, option);
>> +
>> +	return ret;
>> +}
>> +
>> +/**
>> + * powernv_eeh_wait_state - Wait for PE state
>> + * @pe: EEH PE
>> + * @max_wait: maximal period in microsecond
>> + *
>> + * Wait for the state of associated PE. It might take some time
>> + * to retrieve the PE's state.
>> + */
>> +static int powernv_eeh_wait_state(struct eeh_pe *pe, int max_wait)
>> +{
>> +	int ret;
>> +	int mwait;
>> +
>> +	while (1) {
>> +		ret = powernv_eeh_get_state(pe, &mwait);
>> +
>> +		/*
>> +		 * If the PE's state is temporarily unavailable,
>> +		 * we have to wait for the specified time. Otherwise,
>> +		 * the PE's state will be returned immediately.
>> +		 */
>> +		if (ret != EEH_STATE_UNAVAILABLE)
>> +			return ret;
>
>So here we do a compare, while ret is actually a bit mask ...
>
>In fact, ret should be named state_mask or something like that for clarity
>and you should do a bit test here. Also do you want to diffenciate
>permanent unavailability from temp. unavailability ?
>
>> +		max_wait -= mwait;
>
>You decrement max_wait but never test it or use it. You probably mean to
>
>  - Limit mwait to max_wait
>  - If mwait is 0, return
>

Yeah, I will change the code accordingly in next revision.

>> +		msleep(mwait);
>> +	}
>> +
>> +	return EEH_STATE_NOT_SUPPORT;
>> +}
>> +
>> +/**
>> + * powernv_eeh_get_log - Retrieve error log
>> + * @pe: EEH PE
>> + * @severity: temporary or permanent error log
>> + * @drv_log: driver log to be combined with retrieved error log
>> + * @len: length of driver log
>> + *
>> + * Retrieve the temporary or permanent error from the PE.
>> + */
>> +static int powernv_eeh_get_log(struct eeh_pe *pe, int severity,
>> +			char *drv_log, unsigned long len)
>> +{
>> +	struct pci_controller *hose = pe->phb;
>> +	struct pnv_phb *phb = hose->private_data;
>> +	int ret = -EEXIST;
>> +
>> +	if (phb->eeh_ops && phb->eeh_ops->get_log)
>> +		ret = phb->eeh_ops->get_log(pe, severity, drv_log, len);
>> +
>> +	return ret;
>> +}
>> +
>> +/**
>> + * powernv_eeh_configure_bridge - Configure PCI bridges in the indicated PE
>> + * @pe: EEH PE
>> + *
>> + * The function will be called to reconfigure the bridges included
>> + * in the specified PE so that the mulfunctional PE would be recovered
>> + * again.
>> + */
>> +static int powernv_eeh_configure_bridge(struct eeh_pe *pe)
>> +{
>> +	struct pci_controller *hose = pe->phb;
>> +	struct pnv_phb *phb = hose->private_data;
>> +	int ret = 0;
>> +
>> +	if (phb->eeh_ops && phb->eeh_ops->configure_bridge)
>> +		ret = phb->eeh_ops->configure_bridge(pe);
>> +
>> +	return ret;
>> +}

Thanks,
Gavin

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 17/27] powerpc/eeh: I/O chip PE log and bridge setup
  2013-06-11  7:37   ` Benjamin Herrenschmidt
@ 2013-06-12  3:33     ` Gavin Shan
  0 siblings, 0 replies; 43+ messages in thread
From: Gavin Shan @ 2013-06-12  3:33 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: linuxppc-dev, Gavin Shan

On Tue, Jun 11, 2013 at 05:37:59PM +1000, Benjamin Herrenschmidt wrote:
>On Wed, 2013-06-05 at 15:34 +0800, Gavin Shan wrote:
>> The patch adds backends to retrieve error log and configure p2p
>> bridges for the indicated PE.
>> 
>> Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
>> ---
>
>> +/**
>> + * ioda_eeh_configure_bridge - Configure the PCI bridges for the indicated PE
>> + * @pe: EEH PE
>> + *
>> + * For particular PE, it might have included PCI bridges. In order
>> + * to make the PE work properly, those PCI bridges should be configured
>> + * correctly. However, we need do nothing on P7IOC since the reset
>> + * function will do everything that should be covered by the function.
>> + */
>> +static int ioda_eeh_configure_bridge(struct eeh_pe *pe)
>> +{
>> +	return 0;
>
>Does it now ?
>
>IE. Who reconfigures the windows and other config space bits of P2P
>bridges ? Or is this handled elsewhere in Linux or in the upper levels
>of EEH ? Or is that only needed for the PHB ?
>

The EEH core already coverred it, so we needn't do anything here.
If we're going to reset the PHB, firmware will re-initialize the
PHB and the left will be coverred by EEH core.

Thanks,
Gavin

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 15/27] powerpc/eeh: I/O chip EEH state retrieval
  2013-06-12  3:32     ` Gavin Shan
@ 2013-06-12  4:19       ` Benjamin Herrenschmidt
  2013-06-13  4:26         ` Gavin Shan
  0 siblings, 1 reply; 43+ messages in thread
From: Benjamin Herrenschmidt @ 2013-06-12  4:19 UTC (permalink / raw)
  To: Gavin Shan; +Cc: linuxppc-dev

On Wed, 2013-06-12 at 11:32 +0800, Gavin Shan wrote:

> >Same comments about "state" which is really "delay" and is probably
> >not necessary at all ...
> >
> 
> We need the "delay" in future to support PowerKVM guest. If the
> specified PE is being reset, we rely on the delay to hold the
> powerkvm guest for a while until the PE reset is done.

Do we ? Can't we just rely on "temp unavailble" result and wait 1s when
that happens (then try again) ?

IE, A delay associated with a state doesn't make that much sense
semantically speaking. With a state *transition* maybe but this isn't
what this function is about...

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 21/27] powerpc/eeh: Process interrupts caused by EEH
  2013-06-11  8:13   ` Benjamin Herrenschmidt
@ 2013-06-13  4:14     ` Gavin Shan
  0 siblings, 0 replies; 43+ messages in thread
From: Gavin Shan @ 2013-06-13  4:14 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: linuxppc-dev, Gavin Shan

On Tue, Jun 11, 2013 at 06:13:55PM +1000, Benjamin Herrenschmidt wrote:
>On Wed, 2013-06-05 at 15:34 +0800, Gavin Shan wrote:
>
>> diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
>> index d1fd5d4..68ac408 100644
>> --- a/arch/powerpc/include/asm/eeh.h
>> +++ b/arch/powerpc/include/asm/eeh.h
>> @@ -209,6 +209,12 @@ void eeh_add_device_tree_late(struct pci_bus *);
>>  void eeh_add_sysfs_files(struct pci_bus *);
>>  void eeh_remove_bus_device(struct pci_dev *, int);
>>  
>> +#ifdef CONFIG_PPC_POWERNV
>> +void pci_err_release(void);
>> +#else
>> +static inline void pci_err_release(void) { }
>> +#endif
>
>That business of the EEH core calling back into the powernv code
>directly is gross. We don't do that...
>
>See below for a discussion...
>
>.../...
>

Thanks for the review and comments, Ben.

>> +static void pci_err_take(void)
>> +{
>> +	down(&pci_err_seq_sem);
>> +}
>> +
>> +/**
>> + * pci_err_release - Enable error report for sending events
>> + *
>> + * We're hanlding the EEH event one by one. Each time, there only has
>> + * one EEH event caused by error IRQ. The function is called to enable
>> + * error report in order to send more EEH events.
>> + */
>> +void pci_err_release(void)
>> +{
>> +	up(&pci_err_seq_sem);
>> +}
>
>So it's generally bad to keep a semaphore held like that for a long
>time, taken in one corner of the kernel and released in another.
>
>I think you need to do something else. I'm not 100% certain what but
>that doesn't seem right to me.
>
>Also you have two problems I see here:
>
> - A given error will come potentially as both an interrupt and
>a return of ffff's from MMIO. You don't know which one will get it first
>and you end up going through two fairly different code path maybe. IE.
>What happens if interrupts are off for a while on the CPU that is
>targetted by the PHB interrupt and you "detect" a PHB fence as a result
>of an MMIO on another CPU ? Will the normal EEH process clear the fence
>and your interrupt completely miss logging any of those fancy messages
>you added to this file ?
>

Yes, we don't know which one (interrupt and 0xff's from MMIO/PCI-CFG)
comes in first. And no, the normal EEH process will call eeh_ops::get_log()
to collect the log and we won't lose the log.

> - You create another kthread ... we already have one in eeh_event.c,
>why another ?
>

What I thought is to prevent EEH core calling opal_pci_next_error() since
the EEH core is the shared part by multiple platforms. That's to say,
I expected opal_pci_next_error() to be part of powernv platform, and we
need some mechanism to inject EEH event to EEH core so that it can handle
them in sequence. That's why I created a new kthread.

>I think you need to rethink that part. My idea is that the EEH
>interrupts coming from the OPAL notifier would cause you to queue up
>EEH events just like the current ones.
>
>IE. Everything (including get_next_error) should be done by the one EEH
>thread. This also avoids the needs for those extra semaphores.
>
>One option is to create an event without a PE pointer at all. When
>eeh_event_handler() gets that, it would iterate a new hook,
>eeh_ops->next_error() which returns the PE.
>
>That way you can do your printing for fences etc... and return the
>top-level PE for anything PHB-wide. You may also want to add a flag
>maybe to return non-recoverable events and essentially make EEH just
>remove the offending devices from the system instead of panic'ing (panic
>is never a good idea, for all I know, the dead PHB or dead IOC wasn't
>critical to the system operating normally and you may have killed my
>ability to even recover the logs by panic'ing).
>
>Think a bit about it. I know the RTAS model is fairly different than our
>model here, but I like the idea that on powernv, even if we detect an
>MMIO freeze, we don't directly tell the EEH core to process *that* PE
>but instead do the whole next_error thing as well. If the freeze was the
>result of a fence, there's no point trying to process that specific PE.
>
>Something like a fence would thus look like that:
>
> - [ Case 1 -> fence interrupt -> queue eeh_event with no PE ]
>   [ Case 2 -> MMIO freeze detected -> queue eeh event with no PE ]
>
> - eeh_event_handler() sees no PE, loops around eeh_ops->get_next_error,
>since we are single threaded in the EEH thread, it's ok for the IODA
>backend to "cache" the current error data so that subsequent calls into
>the backend know what we are doing.
>
> - get_next_error sees the fence, returns the top-level PE and starts
>the reset (don't wait)
>
> - eeh_event_handler() calls the drivers for all devices on that PE
>(including children) to notify them something's wrong (TODO: Add passing
>by the upper level that this is a fatal error and don't attempt to
>recover).
>
> - It then calls wait_state() which knows it's waiting on a fence, and
>do the appropriate waiting etc...
>
> - Back to normal process...
>
>Don't you think that might be cleaner ? Or do you see a gaping hole in
>my description ?
>

It would incur lots of "unnecessary" EEH events. Normally, we send one
EEH event and we have specific PE (either corresponding to PHB or real
PE) for the event. Before the event is queued to the event queue, the
corresponding PE will be marked as "isolated". If the PE has been put
into "isolated" state, and we won't create another event if detecting
the PE got frozen again.

I think we can remove those pci_err_release/pci_err_take() by:

  - Export function to control "confirm_error_lock" (defined in eeh.c).
    For example, eeh_serialize_lock/unlock().
  - While detecting fenced PHB or frozen PE through interrupt or MMIO
    access, calling eeh_serialize_lock() and won't create EEH event if
    the PHB or PE has been marked "isolated". Otherwise, we will create
    an EEH event and queue it for further processing.


>> +static void pci_err_hub_diag_common(struct OpalIoP7IOCErrorData *data)
>> +{
>> +	/* GEM */
>> +	pr_info("  GEM XFIR:        %016llx\n", data->gemXfir);
>> +	pr_info("  GEM RFIR:        %016llx\n", data->gemRfir);
>> +	pr_info("  GEM RIRQFIR:     %016llx\n", data->gemRirqfir);
>> +	pr_info("  GEM Mask:        %016llx\n", data->gemMask);
>> +	pr_info("  GEM RWOF:        %016llx\n", data->gemRwof);
>> +
>> +	/* LEM */
>> +	pr_info("  LEM FIR:         %016llx\n", data->lemFir);
>> +	pr_info("  LEM Error Mask:  %016llx\n", data->lemErrMask);
>> +	pr_info("  LEM Action 0:    %016llx\n", data->lemAction0);
>> +	pr_info("  LEM Action 1:    %016llx\n", data->lemAction1);
>> +	pr_info("  LEM WOF:         %016llx\n", data->lemWof);
>> +}
>
>That's stuff is P7IOC specific. Make sure you make it clear in the
>function name and that you check the diag data "type". IE. Use a new
>diag_data2 function that returns a type. We can obsolete the old one.
>

Ok. Will update in next revision.

>> +static void pci_err_hub_diag_data(struct pci_controller *hose)
>> +{
>> +	struct pnv_phb *phb = hose->private_data;
>> +	struct OpalIoP7IOCErrorData *data;
>> +	long ret;
>> +
>> +	data = (struct OpalIoP7IOCErrorData *)pci_err_diag;
>> +	ret = opal_pci_get_hub_diag_data(phb->hub_id, data, PAGE_SIZE);
>> +	if (ret != OPAL_SUCCESS) {
>> +		pr_warning("%s: Failed to get HUB#%llx diag-data, ret=%ld\n",
>> +			__func__, phb->hub_id, ret);
>> +		return;
>> +	}
>> +
>> +	/* Check the error type */
>> +	if (data->type <= OPAL_P7IOC_DIAG_TYPE_NONE ||
>> +	    data->type >= OPAL_P7IOC_DIAG_TYPE_LAST) {
>> +		pr_warning("%s: Invalid type of HUB#%llx diag-data (%d)\n",
>> +			__func__, phb->hub_id, data->type);
>> +		return;
>> +	}
>> +
>> +	switch (data->type) {
>> +	case OPAL_P7IOC_DIAG_TYPE_RGC:
>> +		pr_info("P7IOC diag-data for RGC\n\n");
>> +		pci_err_hub_diag_common(data);
>> +		pr_info("  RGC Status:      %016llx\n", data->rgc.rgcStatus);
>> +		pr_info("  RGC LDCP:        %016llx\n", data->rgc.rgcLdcp);
>> +		break;
>> +	case OPAL_P7IOC_DIAG_TYPE_BI:
>> +		pr_info("P7IOC diag-data for BI %s\n\n",
>> +			data->bi.biDownbound ? "Downbound" : "Upbound");
>> +		pci_err_hub_diag_common(data);
>> +		pr_info("  BI LDCP 0:       %016llx\n", data->bi.biLdcp0);
>> +		pr_info("  BI LDCP 1:       %016llx\n", data->bi.biLdcp1);
>> +		pr_info("  BI LDCP 2:       %016llx\n", data->bi.biLdcp2);
>> +		pr_info("  BI Fence Status: %016llx\n", data->bi.biFenceStatus);
>> +		break;
>> +	case OPAL_P7IOC_DIAG_TYPE_CI:
>> +		pr_info("P7IOC diag-data for CI Port %d\\nn",
>> +			data->ci.ciPort);
>> +		pci_err_hub_diag_common(data);
>> +		pr_info("  CI Port Status:  %016llx\n", data->ci.ciPortStatus);
>> +		pr_info("  CI Port LDCP:    %016llx\n", data->ci.ciPortLdcp);
>> +		break;
>> +	case OPAL_P7IOC_DIAG_TYPE_MISC:
>> +		pr_info("P7IOC diag-data for MISC\n\n");
>> +		pci_err_hub_diag_common(data);
>> +		break;
>> +	case OPAL_P7IOC_DIAG_TYPE_I2C:
>> +		pr_info("P7IOC diag-data for I2C\n\n");
>> +		pci_err_hub_diag_common(data);
>> +		break;
>> +	}
>> +}
>> +
>> +static void pci_err_phb_diag_data(struct pci_controller *hose)
>> +{
>> +	struct pnv_phb *phb = hose->private_data;
>> +	struct OpalIoP7IOCPhbErrorData *data;
>> +	int i;
>> +	long ret;
>> +
>> +	data = (struct OpalIoP7IOCPhbErrorData *)pci_err_diag;
>> +	ret = opal_pci_get_phb_diag_data2(phb->opal_id, data, PAGE_SIZE);
>> +	if (ret != OPAL_SUCCESS) {
>> +		pr_warning("%s: Failed to get diag-data for PHB#%x, ret=%ld\n",
>> +			__func__, hose->global_number, ret);
>> +		return;
>> +	}
>> +
>> +	pr_info("PHB#%x Diag-data\n\n", hose->global_number);
>> +	pr_info("  brdgCtl:              %08x\n", data->brdgCtl);
>> +
>> +	pr_info("  portStatusReg:        %08x\n", data->portStatusReg);
>> +	pr_info("  rootCmplxStatus:      %08x\n", data->rootCmplxStatus);
>> +	pr_info("  busAgentStatus:       %08x\n", data->busAgentStatus);
>> +
>> +	pr_info("  deviceStatus:         %08x\n", data->deviceStatus);
>> +	pr_info("  slotStatus:           %08x\n", data->slotStatus);
>> +	pr_info("  linkStatus:           %08x\n", data->linkStatus);
>> +	pr_info("  devCmdStatus:         %08x\n", data->devCmdStatus);
>> +	pr_info("  devSecStatus:         %08x\n", data->devSecStatus);
>> +
>> +	pr_info("  rootErrorStatus:      %08x\n", data->rootErrorStatus);
>> +	pr_info("  uncorrErrorStatus:    %08x\n", data->uncorrErrorStatus);
>> +	pr_info("  corrErrorStatus:      %08x\n", data->corrErrorStatus);
>> +	pr_info("  tlpHdr1:              %08x\n", data->tlpHdr1);
>> +	pr_info("  tlpHdr2:              %08x\n", data->tlpHdr2);
>> +	pr_info("  tlpHdr3:              %08x\n", data->tlpHdr3);
>> +	pr_info("  tlpHdr4:              %08x\n", data->tlpHdr4);
>> +	pr_info("  sourceId:             %08x\n", data->sourceId);
>> +
>> +	pr_info("  errorClass:           %016llx\n", data->errorClass);
>> +	pr_info("  correlator:           %016llx\n", data->correlator);
>> +	pr_info("  p7iocPlssr:           %016llx\n", data->p7iocPlssr);
>> +	pr_info("  p7iocCsr:             %016llx\n", data->p7iocCsr);
>> +	pr_info("  lemFir:               %016llx\n", data->lemFir);
>> +	pr_info("  lemErrorMask:         %016llx\n", data->lemErrorMask);
>> +	pr_info("  lemWOF:               %016llx\n", data->lemWOF);
>> +	pr_info("  phbErrorStatus:       %016llx\n", data->phbErrorStatus);
>> +	pr_info("  phbFirstErrorStatus:  %016llx\n", data->phbFirstErrorStatus);
>> +	pr_info("  phbErrorLog0:         %016llx\n", data->phbErrorLog0);
>> +	pr_info("  phbErrorLog1:         %016llx\n", data->phbErrorLog1);
>> +	pr_info("  mmioErrorStatus:      %016llx\n", data->mmioErrorStatus);
>> +	pr_info("  mmioFirstErrorStatus: %016llx\n", data->mmioFirstErrorStatus);
>> +	pr_info("  mmioErrorLog0:        %016llx\n", data->mmioErrorLog0);
>> +	pr_info("  mmioErrorLog1:        %016llx\n", data->mmioErrorLog1);
>> +	pr_info("  dma0ErrorStatus:      %016llx\n", data->dma0ErrorStatus);
>> +	pr_info("  dma0FirstErrorStatus: %016llx\n", data->dma0FirstErrorStatus);
>> +	pr_info("  dma0ErrorLog0:        %016llx\n", data->dma0ErrorLog0);
>> +	pr_info("  dma0ErrorLog1:        %016llx\n", data->dma0ErrorLog1);
>> +	pr_info("  dma1ErrorStatus:      %016llx\n", data->dma1ErrorStatus);
>> +	pr_info("  dma1FirstErrorStatus: %016llx\n", data->dma1FirstErrorStatus);
>> +	pr_info("  dma1ErrorLog0:        %016llx\n", data->dma1ErrorLog0);
>> +	pr_info("  dma1ErrorLog1:        %016llx\n", data->dma1ErrorLog1);
>> +
>> +	for (i = 0; i < OPAL_P7IOC_NUM_PEST_REGS; i++) {
>> +		if ((data->pestA[i] >> 63) == 0 &&
>> +		    (data->pestB[i] >> 63) == 0)
>> +			continue;
>> +
>> +		pr_info("  PE[%3d] PESTA:        %016llx\n", i, data->pestA[i]);
>> +		pr_info("          PESTB:        %016llx\n", data->pestB[i]);
>> +	}
>> +}
>> +
>> +/*
>> + * Process PCI errors from IOC, PHB, or PE. Here's the list
>> + * of expected error types and their severities, as well as
>> + * the corresponding action.
>> + *
>> + * Type                        Severity                Action
>> + * OPAL_EEH_ERROR_IOC  OPAL_EEH_SEV_IOC_DEAD   panic
>> + * OPAL_EEH_ERROR_IOC  OPAL_EEH_SEV_INF        diag_data
>> + * OPAL_EEH_ERROR_PHB  OPAL_EEH_SEV_PHB_DEAD   panic
>> + * OPAL_EEH_ERROR_PHB  OPAL_EEH_SEV_PHB_FENCED eeh
>> + * OPAL_EEH_ERROR_PHB  OPAL_EEH_SEV_INF        diag_data
>> + * OPAL_EEH_ERROR_PE   OPAL_EEH_SEV_PE_ER      eeh
>> + */
>> +static void pci_err_process(struct pci_controller *hose,
>> +			u16 err_type, u16 severity, u16 pe_no)
>> +{
>> +	PCI_ERR_DBG("PCI_ERR: Process error (%d, %d, %d) on PHB#%x\n",
>> +		err_type, severity, pe_no, hose->global_number);
>> +
>> +	switch (err_type) {
>> +	case OPAL_EEH_IOC_ERROR:
>> +		if (severity == OPAL_EEH_SEV_IOC_DEAD)
>> +			panic("Dead IOC of PHB#%x", hose->global_number);
>> +		else if (severity == OPAL_EEH_SEV_INF) {
>> +			pci_err_hub_diag_data(hose);
>> +			pci_err_release();
>> +		}
>> +
>> +		break;
>> +	case OPAL_EEH_PHB_ERROR:
>> +		if (severity == OPAL_EEH_SEV_PHB_DEAD)
>> +			panic("Dead PHB#%x", hose->global_number);
>> +		else if (severity == OPAL_EEH_SEV_PHB_FENCED)
>> +			pci_err_check_phb(hose);
>> +		else if (severity == OPAL_EEH_SEV_INF) {
>> +			pci_err_phb_diag_data(hose);
>> +			pci_err_release();
>> +		}
>> +
>> +		break;
>> +	case OPAL_EEH_PE_ERROR:
>> +		pci_err_check_pe(hose, pe_no);
>> +		break;
>> +	}
>> +}
>> +
>> +static int pci_err_handler(void *dummy)
>> +{
>> +	struct pnv_phb *phb;
>> +	struct pci_controller *hose, *tmp;
>> +	u64 frozen_pe_no;
>> +	u16 err_type, severity;
>> +	long ret;
>> +
>> +	while (!kthread_should_stop()) {
>> +		down(&pci_err_int_sem);
>> +		PCI_ERR_DBG("PCI_ERR: Get PCI error semaphore\n");
>> +
>> +		list_for_each_entry_safe(hose, tmp, &hose_list, list_node) {
>> +			phb = hose->private_data;
>> +restart:
>> +			pci_err_take();
>> +			ret = opal_pci_next_error(phb->opal_id,
>> +					&frozen_pe_no, &err_type, &severity);
>> +
>> +			/* If OPAL API returns error, we needn't proceed */
>> +			if (ret != OPAL_SUCCESS) {
>> +				PCI_ERR_DBG("PCI_ERR: Invalid return value on "
>> +					    "PHB#%x (0x%lx) from opal_pci_next_error",
>> +					    hose->global_number, ret);
>> +				pci_err_release();
>> +				continue;
>> +			}
>> +
>> +			/* If the PHB doesn't have error, stop processing */
>> +			if (err_type == OPAL_EEH_NO_ERROR ||
>> +			    severity == OPAL_EEH_SEV_NO_ERROR) {
>> +				PCI_ERR_DBG("PCI_ERR: No error found on PHB#%x\n",
>> +					hose->global_number);
>> +				pci_err_release();
>> +				continue;
>> +			}
>> +
>> +			/*
>> +			 * Process the error until there're no pending
>> +			 * errors on the specific PHB.
>> +			 */
>> +			pci_err_process(hose, err_type, severity, frozen_pe_no);
>> +			goto restart;
>> +		}
>> +	}
>> +
>> +	return 0;
>> +}
>> +
>> +/*
>> + * pci_err_init - Initialize PCI error handling component
>> + *
>> + * It should be done before OPAL interrupts got registered because
>> + * that depends on this.
>> + */
>> +static int __init pci_err_init(void)
>> +{
>> +	int ret = 0;
>> +
>> +	if (!firmware_has_feature(FW_FEATURE_OPALv3)) {
>> +		pr_err("%s: FW_FEATURE_OPALv3 required!\n",
>> +			__func__);
>> +		return -EINVAL;
>> +	}
>> +
>> +	pci_err_diag = (char *)__get_free_page(GFP_KERNEL|__GFP_ZERO);
>> +	if (!pci_err_diag) {
>> +		pr_err("%s: Failed to alloc memory for diag data\n",
>> +			__func__);
>> +		return -ENOMEM;
>> +	}
>> +
>> +	/* Initialize semaphore */
>> +	sema_init(&pci_err_int_sem, 0);
>> +	sema_init(&pci_err_seq_sem, 1);
>> +
>> +	/* Start kthread */
>> +	pci_err_thread = kthread_run(pci_err_handler, NULL, "PCI_ERR");
>> +	if (IS_ERR(pci_err_thread)) {
>> +		ret = PTR_ERR(pci_err_thread);
>> +		pr_err("%s: Failed to start kthread, ret=%d\n",
>> +			__func__, ret);
>> +	}
>> +
>> +	free_page((unsigned long)pci_err_diag);
>> +	return ret;
>> +}
>> +
>> +arch_initcall(pci_err_init);
>> diff --git a/arch/powerpc/platforms/pseries/eeh_event.c b/arch/powerpc/platforms/pseries/eeh_event.c
>> index 1f86b80..e4c636e 100644
>> --- a/arch/powerpc/platforms/pseries/eeh_event.c
>> +++ b/arch/powerpc/platforms/pseries/eeh_event.c
>> @@ -84,6 +84,14 @@ static int eeh_event_handler(void * dummy)
>>  	eeh_handle_event(pe);
>>  	eeh_pe_state_clear(pe, EEH_PE_RECOVERING);
>>  
>> +	/*
>> +	 * If it's the event caused by error reporting IRQ,
>> +	 * we need release the module so that precedent events
>> +	 * could be fired.
>> +	 */
>> +	if (event->flag & EEH_EVENT_INT)
>> +		pci_err_release();
>> +
>>  	kfree(event);
>>  	mutex_unlock(&eeh_event_mutex);
>>  

Thanks,
Gavin

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 15/27] powerpc/eeh: I/O chip EEH state retrieval
  2013-06-12  4:19       ` Benjamin Herrenschmidt
@ 2013-06-13  4:26         ` Gavin Shan
  2013-06-13  4:42           ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 43+ messages in thread
From: Gavin Shan @ 2013-06-13  4:26 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: linuxppc-dev, Gavin Shan

On Wed, Jun 12, 2013 at 02:19:25PM +1000, Benjamin Herrenschmidt wrote:
>On Wed, 2013-06-12 at 11:32 +0800, Gavin Shan wrote:
>
>> >Same comments about "state" which is really "delay" and is probably
>> >not necessary at all ...
>> >
>> 
>> We need the "delay" in future to support PowerKVM guest. If the
>> specified PE is being reset, we rely on the delay to hold the
>> powerkvm guest for a while until the PE reset is done.
>
>Do we ? Can't we just rely on "temp unavailble" result and wait 1s when
>that happens (then try again) ?
>
>IE, A delay associated with a state doesn't make that much sense
>semantically speaking. With a state *transition* maybe but this isn't
>what this function is about...
>

Sorry, Ben. I should have clarified more clearly: Basically, the EEH
core is going to be shared by: powernv, pseries on top of powernv or
phyp. While running pseries on top of phyp, we're getting PE state
through RTAS call "ibm,read-slot-reset-state2" and desired delay returned
from f/w for temporary unavailable PE. In the future, the function
ioda_eeh_get_state() will be called directly to emulate the RTAS call
for guest running on top of PowerNV.

So the answer is we can do it by makeing the assumption that f/w won't
return valid delay and we're going to use default value (1 second) for
guest on powernv or phyp, or we keep the delay here.

Thanks,
Gavin

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 15/27] powerpc/eeh: I/O chip EEH state retrieval
  2013-06-13  4:26         ` Gavin Shan
@ 2013-06-13  4:42           ` Benjamin Herrenschmidt
  2013-06-13  5:50             ` Gavin Shan
  0 siblings, 1 reply; 43+ messages in thread
From: Benjamin Herrenschmidt @ 2013-06-13  4:42 UTC (permalink / raw)
  To: Gavin Shan; +Cc: linuxppc-dev

On Thu, 2013-06-13 at 12:26 +0800, Gavin Shan wrote:
> So the answer is we can do it by makeing the assumption that f/w won't
> return valid delay and we're going to use default value (1 second) for
> guest on powernv or phyp, or we keep the delay here.

Ok, at the very least then change the name to "unavailable_delay" or
something explicit like that then :-)

BTW. I've already applied patches 1 and 2 to my tree, you don't have to
resend those. They'll show up today or tomorrow when I push my next
branch out.

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 15/27] powerpc/eeh: I/O chip EEH state retrieval
  2013-06-13  4:42           ` Benjamin Herrenschmidt
@ 2013-06-13  5:50             ` Gavin Shan
  0 siblings, 0 replies; 43+ messages in thread
From: Gavin Shan @ 2013-06-13  5:50 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: linuxppc-dev, Gavin Shan

On Thu, Jun 13, 2013 at 02:42:17PM +1000, Benjamin Herrenschmidt wrote:
>On Thu, 2013-06-13 at 12:26 +0800, Gavin Shan wrote:
>> So the answer is we can do it by makeing the assumption that f/w won't
>> return valid delay and we're going to use default value (1 second) for
>> guest on powernv or phyp, or we keep the delay here.
>
>Ok, at the very least then change the name to "unavailable_delay" or
>something explicit like that then :-)
>

Ok.

>BTW. I've already applied patches 1 and 2 to my tree, you don't have to
>resend those. They'll show up today or tomorrow when I push my next
>branch out.
>

Ok. Thanks, Ben.

Thanks,
Gavin

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH 20/27] powerpc/eeh: Enable EEH check for config access
  2013-06-15  9:02 [PATCH v4 " Gavin Shan
@ 2013-06-15  9:03 ` Gavin Shan
  0 siblings, 0 replies; 43+ messages in thread
From: Gavin Shan @ 2013-06-15  9:03 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Gavin Shan

The patch enables EEH check and let EEH core to process the EEH
errors for PowerNV platform while accessing config space. Originally,
the implementation already had mechanism to check EEH errors and
tried to recover from them. However, we never let EEH core to handle
the EEH errors.

Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/powernv/pci.c |   40 +++++++++++++++++++++++++++++++++-
 1 files changed, 39 insertions(+), 1 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci.c b/arch/powerpc/platforms/powernv/pci.c
index 20af220..6d9a506 100644
--- a/arch/powerpc/platforms/powernv/pci.c
+++ b/arch/powerpc/platforms/powernv/pci.c
@@ -32,6 +32,8 @@
 #include <asm/iommu.h>
 #include <asm/tce.h>
 #include <asm/firmware.h>
+#include <asm/eeh_event.h>
+#include <asm/eeh.h>
 
 #include "powernv.h"
 #include "pci.h"
@@ -259,6 +261,10 @@ static int pnv_pci_read_config(struct pci_bus *bus,
 {
 	struct pci_controller *hose = pci_bus_to_host(bus);
 	struct pnv_phb *phb = hose->private_data;
+#ifdef CONFIG_EEH
+	struct device_node *busdn, *dn;
+	struct eeh_pe *phb_pe = NULL;
+#endif
 	u32 bdfn = (((uint64_t)bus->number) << 8) | devfn;
 	s64 rc;
 
@@ -291,8 +297,34 @@ static int pnv_pci_read_config(struct pci_bus *bus,
 	cfg_dbg("pnv_pci_read_config bus: %x devfn: %x +%x/%x -> %08x\n",
 		bus->number, devfn, where, size, *val);
 
-	/* Check if the PHB got frozen due to an error (no response) */
+	/*
+	 * Check if the specified PE has been put into frozen
+	 * state. On the other hand, we needn't do that while
+	 * the PHB has been put into frozen state because of
+	 * PHB-fatal errors.
+	 */
+#ifdef CONFIG_EEH
+	phb_pe = eeh_phb_pe_get(hose);
+	if (phb_pe && (phb_pe->state & EEH_PE_ISOLATED))
+		return PCIBIOS_SUCCESSFUL;
+
+	if (phb->eeh_enabled) {
+		if (*val == EEH_IO_ERROR_VALUE(size)) {
+			busdn = pci_bus_to_OF_node(bus);
+			for (dn = busdn->child; dn; dn = dn->sibling) {
+				struct pci_dn *pdn = PCI_DN(dn);
+
+				if (pdn && pdn->devfn == devfn &&
+				    eeh_dev_check_failure(of_node_to_eeh_dev(dn)))
+					return PCIBIOS_DEVICE_NOT_FOUND;
+			}
+		}
+	} else {
+		pnv_pci_config_check_eeh(phb, bus, bdfn);
+	}
+#else
 	pnv_pci_config_check_eeh(phb, bus, bdfn);
+#endif
 
 	return PCIBIOS_SUCCESSFUL;
 }
@@ -323,8 +355,14 @@ static int pnv_pci_write_config(struct pci_bus *bus,
 	default:
 		return PCIBIOS_FUNC_NOT_SUPPORTED;
 	}
+
 	/* Check if the PHB got frozen due to an error (no response) */
+#ifdef CONFIG_EEH
+	if (!phb->eeh_enabled)
+		pnv_pci_config_check_eeh(phb, bus, bdfn);
+#else
 	pnv_pci_config_check_eeh(phb, bus, bdfn);
+#endif
 
 	return PCIBIOS_SUCCESSFUL;
 }
-- 
1.7.5.4

^ permalink raw reply related	[flat|nested] 43+ messages in thread

end of thread, other threads:[~2013-06-15  9:03 UTC | newest]

Thread overview: 43+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-06-05  7:34 [PATCH v3 00/27] EEH Support for PowerNV platform Gavin Shan
2013-06-05  7:34 ` [PATCH 01/27] powerpc/eeh: Fix fetching bus for single-dev-PE Gavin Shan
2013-06-05  7:34 ` [PATCH 02/27] powerpc/eeh: Enhance converting EEH dev Gavin Shan
2013-06-05  7:34 ` [PATCH 03/27] powerpc/eeh: Make eeh_phb_pe_get() public Gavin Shan
2013-06-05  7:34 ` [PATCH 04/27] powerpc/eeh: Make eeh_pe_get() public Gavin Shan
2013-06-05  7:34 ` [PATCH 05/27] powerpc/eeh: Trace PCI bus from PE Gavin Shan
2013-06-05  7:34 ` [PATCH 06/27] powerpc/eeh: Make eeh_init() public Gavin Shan
2013-06-05  7:34 ` [PATCH 07/27] powerpc/eeh: EEH post initialization operation Gavin Shan
2013-06-05  7:34 ` [PATCH 08/27] powerpc/eeh: Refactor eeh_reset_pe_once() Gavin Shan
2013-06-05  7:34 ` [PATCH 09/27] powerpc/eeh: Delay EEH probe during hotplug Gavin Shan
2013-06-05  7:34 ` [PATCH 10/27] powerpc/eeh: Differentiate EEH events Gavin Shan
2013-06-05  7:34 ` [PATCH 11/27] powerpc/eeh: Sync OPAL API with firmware Gavin Shan
2013-06-05  7:34 ` [PATCH 12/27] powerpc/eeh: EEH backend for P7IOC Gavin Shan
2013-06-05  7:34 ` [PATCH 13/27] powerpc/eeh: I/O chip post initialization Gavin Shan
2013-06-05  7:34 ` [PATCH 14/27] powerpc/eeh: I/O chip EEH enable option Gavin Shan
2013-06-05  7:34 ` [PATCH 15/27] powerpc/eeh: I/O chip EEH state retrieval Gavin Shan
2013-06-11  7:37   ` Benjamin Herrenschmidt
2013-06-12  3:32     ` Gavin Shan
2013-06-12  4:19       ` Benjamin Herrenschmidt
2013-06-13  4:26         ` Gavin Shan
2013-06-13  4:42           ` Benjamin Herrenschmidt
2013-06-13  5:50             ` Gavin Shan
2013-06-05  7:34 ` [PATCH 16/27] powerpc/eeh: I/O chip PE reset Gavin Shan
2013-06-05  7:34 ` [PATCH 17/27] powerpc/eeh: I/O chip PE log and bridge setup Gavin Shan
2013-06-11  7:37   ` Benjamin Herrenschmidt
2013-06-12  3:33     ` Gavin Shan
2013-06-05  7:34 ` [PATCH 18/27] powerpc/eeh: PowerNV EEH backends Gavin Shan
2013-06-05  7:34 ` [PATCH 19/27] powerpc/eeh: Initialization for PowerNV Gavin Shan
2013-06-05  7:34 ` [PATCH 20/27] powerpc/eeh: Enable EEH check for config access Gavin Shan
2013-06-05  7:34 ` [PATCH 21/27] powerpc/eeh: Process interrupts caused by EEH Gavin Shan
2013-06-11  8:13   ` Benjamin Herrenschmidt
2013-06-13  4:14     ` Gavin Shan
2013-06-05  7:34 ` [PATCH 22/27] powerpc/eeh: Allow to check fenced PHB proactively Gavin Shan
2013-06-05  7:34 ` [PATCH 23/27] powernv/opal: Notifier for OPAL events Gavin Shan
2013-06-12  0:32   ` Benjamin Herrenschmidt
2013-06-12  3:15     ` Gavin Shan
2013-06-05  7:34 ` [PATCH 24/27] powernv/opal: Disable OPAL notifier upon poweroff Gavin Shan
2013-06-05  7:34 ` [PATCH 25/27] powerpc/eeh: Register OPAL notifier for PCI error Gavin Shan
2013-06-05  7:34 ` [PATCH 26/27] powerpc/powernv: Debugfs directory for PHB Gavin Shan
2013-06-05  7:34 ` [PATCH 27/27] powerpc/eeh: Debugfs for error injection Gavin Shan
2013-06-11  7:46 ` [PATCH v3 00/27] EEH Support for PowerNV platform Benjamin Herrenschmidt
2013-06-12  3:18   ` Gavin Shan
  -- strict thread matches above, loose matches on Subject: below --
2013-06-15  9:02 [PATCH v4 " Gavin Shan
2013-06-15  9:03 ` [PATCH 20/27] powerpc/eeh: Enable EEH check for config access Gavin Shan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).