[PATCH 25/27] powerpc/eeh: Register OPAL notifier for PCI error

linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 25/27] powerpc/eeh: Register OPAL notifier for PCI error
  2013-06-05  7:34 [PATCH v3 00/27] EEH Support for PowerNV platform Gavin Shan
@ 2013-06-05  7:34 ` Gavin Shan
  0 siblings, 0 replies; 34+ messages in thread
From: Gavin Shan @ 2013-06-05  7:34 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Gavin Shan

The patch intends to register OPAL event notifier and process the
PCI errors from firmware. If we have pending PCI errors, the kthread
will be invoke to handle that in turn.

Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/powernv/pci-err.c |   15 +++++++++++++++
 1 files changed, 15 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-err.c b/arch/powerpc/platforms/powernv/pci-err.c
index bfc95c6..d77dd61 100644
--- a/arch/powerpc/platforms/powernv/pci-err.c
+++ b/arch/powerpc/platforms/powernv/pci-err.c
@@ -64,6 +64,12 @@ static struct semaphore pci_err_int_sem;
 static struct semaphore pci_err_seq_sem;
 static char *pci_err_diag;
 
+static void pci_err_event(u64 event)
+{
+	/* Notify kthread to process error */
+	up(&pci_err_int_sem);
+}
+
 static void pci_err_take(void)
 {
 	down(&pci_err_seq_sem);
@@ -451,6 +457,14 @@ static int __init pci_err_init(void)
 	sema_init(&pci_err_int_sem, 0);
 	sema_init(&pci_err_seq_sem, 1);
 
+	/* Register OPAL event notifier */
+	ret = opal_notifier_register(OPAL_EVENT_PCI_ERROR, pci_err_event);
+	if (ret) {
+		pr_err("%s: Failed to register OPAL notifier, rc=%d\n",
+			__func__, ret);
+		goto out;
+	}
+
 	/* Start kthread */
 	pci_err_thread = kthread_run(pci_err_handler, NULL, "PCI_ERR");
 	if (IS_ERR(pci_err_thread)) {
@@ -459,6 +473,7 @@ static int __init pci_err_init(void)
 			__func__, ret);
 	}
 
+out:
 	free_page((unsigned long)pci_err_diag);
 	return ret;
 }
-- 
1.7.5.4

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v4 00/27] EEH Support for PowerNV platform
@ 2013-06-15  9:02 Gavin Shan
  2013-06-15  9:02 ` [PATCH 01/27] powerpc/eeh: Move common part to kernel directory Gavin Shan
                   ` (26 more replies)
  0 siblings, 27 replies; 34+ messages in thread
From: Gavin Shan @ 2013-06-15  9:02 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Gavin Shan

Initially, the series of patches is built based on 3.10.RC1 and the patchset
doesn't intend to enable EEH functionality for PHB3 for now. Obviously, PHB3
EEH support on PowerNV platform is something to do in future.

The series of patches intends to support EEH for PowerNV platform. The EEH
core already supports multiple probe methods: device tree nodes and PCI
devices. For EEH on PowerNV, we're using PCI devices to do EEH probe, which
is different from the probe type used on pSeries platform. Another point I
should mention is that the overall EEH would be split up to 3 layers: EEH
core, platform layer and I/O chip layer. It would make the EEH on PowerNV
platform can achieve more flexibility and support more I/O chips in future.
Besides, the EEH event can be produced by detecting 0xFF's from reading
PCI config or I/O registers, or from interrupts dedicated for EEH error
reporting. So we have to handle the EEH error interrupts. On the other hand,
the EEH events will be processed by EEH core like pSeries platform does.

We will have exported debugfs entries ("/sys/kernel/debug/powerpc/PCIxxxx/err_injct"),
which allows you to control the 0xD10 register in order to force errors like
frozen PE and fenced PHB for testing purpose. The following example is usualy
what I'm using to control that register. The patchset has been verified on
Firebird-L machine where I have 2 Emulex ethernet card on PHB#0. I keep pinging
to one of the ethernet cards (eth0) from external and then use following commands
to produce frozen PE or fenced PHB errors. Eventually, the errors can be recovered
and the ethernet card is reachable after temporary connection lost.

Trigger frozen PE:

	echo 0x0000000002000000 > /sys/kernel/debug/powerpc/PCI0000/err_injct
	sleep 1
	echo 0x0 > /sys/kernel/debug/powerpc/PCI0000/err_injct

Trigger fenced PHB:

	echo 0x8000000000000000 > /sys/kernel/debug/powerpc/PCI0000/err_injct

Change log
==========

v3 -> v4:
	* Rebase to 3.10.RC5 with originally first 2 patches from v3 applied and
	  won't resend the first 2 patches again.
	* Add 2 (first) patches to move the EEH core from pSeries platform to
	  arch/powerpc/kernel and applied necessary cleanup.
	* PowerNV platform layer initialize the delay for temporarily unavailable
	  PE state to 0 and set it to default value (1 second) if necessary.
	* Change variable names according to Ben's comments.
	* Account for the maximal allowed waiting time in eeh-powernv.c::powernv_eeh_wait_state()
	* Introduce eeh_serialize_lock/unlock so that pci-err.c can inject EEH
	  event with consistent PE state (isolated/dead state). In a result,
	  pci-err.c::pci_err_seq_sem has been removed completely.
	* Introduce PE state (EEH_PE_PHB_DEAD) and the logic to remove the corresponding
	  PCI domain upon detected dead IOC or PHB, instead of panicing the system.
	* Remove unnecessary contiguous check on one specific PHB in pci-err.c::pci_err_handler().
	* Refactor functions in pci-err.c for printing PHB diag-data. The diag-data header
	  (including version/ioType) have been parsed and call into appropriate function
	  for outputing the diag-data.
	* Changelog adjustment on "OPAL notifier" according to Ben's comments.
	* Split original opal_notifier_enable() to opal_notifier_enable/disable.
	* Allow multiple clients to listen same OPAL event change in OPAL notifier.
	* OPAL notifier is tracing the event change, instead of events.
v2 -> v3:
	* Rebase to 3.10.RC4
	* Replace eeh_pci_dev_traverse() with pci_walk_bus()
	* Changlog adjustment to make that more clear
	* To call msleep() if possible after opal_pci_poll()
	* Make sure we have OPALv3
	* OPAL notifier so that we can register callback for the monitored events.
	  The OPAL notifier is disabled while restarting or powering off the system.
	* Make the debugfs entries something like (PCIxxxx/err_injct)
	* Split the patch so that can be backported to stable kernel
	* Allow to detect fenced PHB proactively (without interrupt)
	* Start to use opal_pci_get_phb_diag_data2()
	* Stack dump upon fenced PHB
v1 -> v2:
	* Rebase to 3.10.RC3
	* Don't fetch PE state for the case of fenced PHB. It usually takes long
	  time and possiblly incurs softlock warning. It requires the corresponding
	  changes for the underly firmware
	* Add debugfs entries so that we can inject errors like frozen PE and
	  fenced PHB for testing purpose

---

arch/powerpc/include/asm/eeh.h                 |   24 +-
arch/powerpc/include/asm/opal.h                |  139 +++-
arch/powerpc/kernel/Makefile                   |    4 +-
arch/powerpc/kernel/eeh.c                      | 1044 ++++++++++++++++++++++++
arch/powerpc/kernel/eeh_cache.c                |  319 ++++++++
arch/powerpc/kernel/eeh_dev.c                  |  112 +++
arch/powerpc/kernel/eeh_driver.c               |  562 +++++++++++++
arch/powerpc/kernel/eeh_event.c                |  142 ++++
arch/powerpc/kernel/eeh_pe.c                   |  675 +++++++++++++++
arch/powerpc/kernel/eeh_sysfs.c                |   75 ++
arch/powerpc/kernel/pci_hotplug.c              |  111 +++
arch/powerpc/platforms/Kconfig                 |    5 +
arch/powerpc/platforms/powernv/Makefile        |    1 +
arch/powerpc/platforms/powernv/eeh-ioda.c      |  538 ++++++++++++
arch/powerpc/platforms/powernv/eeh-powernv.c   |  396 +++++++++
arch/powerpc/platforms/powernv/opal-wrappers.S |    3 +
arch/powerpc/platforms/powernv/opal.c          |   74 ++-
arch/powerpc/platforms/powernv/pci-err.c       |  536 ++++++++++++
arch/powerpc/platforms/powernv/pci-ioda.c      |   38 +-
arch/powerpc/platforms/powernv/pci-p5ioc2.c    |    6 +-
arch/powerpc/platforms/powernv/pci.c           |   43 +-
arch/powerpc/platforms/powernv/pci.h           |   27 +
arch/powerpc/platforms/powernv/setup.c         |    4 +
arch/powerpc/platforms/pseries/Kconfig         |    5 -
arch/powerpc/platforms/pseries/Makefile        |    4 +-
arch/powerpc/platforms/pseries/eeh.c           |  942 ---------------------
arch/powerpc/platforms/pseries/eeh_cache.c     |  319 --------
arch/powerpc/platforms/pseries/eeh_dev.c       |  112 ---
arch/powerpc/platforms/pseries/eeh_driver.c    |  552 -------------
arch/powerpc/platforms/pseries/eeh_event.c     |  142 ----
arch/powerpc/platforms/pseries/eeh_pe.c        |  653 ---------------
arch/powerpc/platforms/pseries/eeh_sysfs.c     |   75 --
arch/powerpc/platforms/pseries/pci_dlpar.c     |   85 --
33 files changed, 4848 insertions(+), 2919 deletions(-)
create mode 100644 arch/powerpc/kernel/eeh.c
create mode 100644 arch/powerpc/kernel/eeh_cache.c
create mode 100644 arch/powerpc/kernel/eeh_dev.c
create mode 100644 arch/powerpc/kernel/eeh_driver.c
create mode 100644 arch/powerpc/kernel/eeh_event.c
create mode 100644 arch/powerpc/kernel/eeh_pe.c
create mode 100644 arch/powerpc/kernel/eeh_sysfs.c
create mode 100644 arch/powerpc/kernel/pci_hotplug.c
create mode 100644 arch/powerpc/platforms/powernv/eeh-ioda.c
create mode 100644 arch/powerpc/platforms/powernv/eeh-powernv.c
create mode 100644 arch/powerpc/platforms/powernv/pci-err.c
delete mode 100644 arch/powerpc/platforms/pseries/eeh.c
delete mode 100644 arch/powerpc/platforms/pseries/eeh_cache.c
delete mode 100644 arch/powerpc/platforms/pseries/eeh_dev.c
delete mode 100644 arch/powerpc/platforms/pseries/eeh_driver.c
delete mode 100644 arch/powerpc/platforms/pseries/eeh_event.c
delete mode 100644 arch/powerpc/platforms/pseries/eeh_pe.c
delete mode 100644 arch/powerpc/platforms/pseries/eeh_sysfs.c

Thanks,
Gavin

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH 01/27] powerpc/eeh: Move common part to kernel directory
  2013-06-15  9:02 [PATCH v4 00/27] EEH Support for PowerNV platform Gavin Shan
@ 2013-06-15  9:02 ` Gavin Shan
  2013-06-17  3:03   ` Mike Qiu
  2013-06-15  9:02 ` [PATCH 02/27] powerpc/eeh: Cleanup for EEH core Gavin Shan
                   ` (25 subsequent siblings)
  26 siblings, 1 reply; 34+ messages in thread
From: Gavin Shan @ 2013-06-15  9:02 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Gavin Shan

The patch moves the common part of EEH core into arch/powerpc/kernel
directory so that we needn't PPC_PSERIES while compiling POWERNV
platform:

	* Move the EEH common part into arch/powerpc/kernel
	* Move the functions for PCI hotplug from pSeries platform to
	  arch/powerpc/kernel/pci_hotplug.c
	* Move CONFIG_EEH from arch/powerpc/platforms/pseries/Kconfig to
	  arch/powerpc/platforms/Kconfig
	* Adjust makefile accordingly

Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
---
 arch/powerpc/kernel/Makefile                |    4 +-
 arch/powerpc/kernel/eeh.c                   |  942 +++++++++++++++++++++++++++
 arch/powerpc/kernel/eeh_cache.c             |  319 +++++++++
 arch/powerpc/kernel/eeh_dev.c               |  112 ++++
 arch/powerpc/kernel/eeh_driver.c            |  552 ++++++++++++++++
 arch/powerpc/kernel/eeh_event.c             |  142 ++++
 arch/powerpc/kernel/eeh_pe.c                |  653 +++++++++++++++++++
 arch/powerpc/kernel/eeh_sysfs.c             |   75 +++
 arch/powerpc/kernel/pci_hotplug.c           |  111 ++++
 arch/powerpc/platforms/Kconfig              |    5 +
 arch/powerpc/platforms/pseries/Kconfig      |    5 -
 arch/powerpc/platforms/pseries/Makefile     |    4 +-
 arch/powerpc/platforms/pseries/eeh.c        |  942 ---------------------------
 arch/powerpc/platforms/pseries/eeh_cache.c  |  319 ---------
 arch/powerpc/platforms/pseries/eeh_dev.c    |  112 ----
 arch/powerpc/platforms/pseries/eeh_driver.c |  552 ----------------
 arch/powerpc/platforms/pseries/eeh_event.c  |  142 ----
 arch/powerpc/platforms/pseries/eeh_pe.c     |  653 -------------------
 arch/powerpc/platforms/pseries/eeh_sysfs.c  |   75 ---
 arch/powerpc/platforms/pseries/pci_dlpar.c  |   85 ---
 20 files changed, 2915 insertions(+), 2889 deletions(-)
 create mode 100644 arch/powerpc/kernel/eeh.c
 create mode 100644 arch/powerpc/kernel/eeh_cache.c
 create mode 100644 arch/powerpc/kernel/eeh_dev.c
 create mode 100644 arch/powerpc/kernel/eeh_driver.c
 create mode 100644 arch/powerpc/kernel/eeh_event.c
 create mode 100644 arch/powerpc/kernel/eeh_pe.c
 create mode 100644 arch/powerpc/kernel/eeh_sysfs.c
 create mode 100644 arch/powerpc/kernel/pci_hotplug.c
 delete mode 100644 arch/powerpc/platforms/pseries/eeh.c
 delete mode 100644 arch/powerpc/platforms/pseries/eeh_cache.c
 delete mode 100644 arch/powerpc/platforms/pseries/eeh_dev.c
 delete mode 100644 arch/powerpc/platforms/pseries/eeh_driver.c
 delete mode 100644 arch/powerpc/platforms/pseries/eeh_event.c
 delete mode 100644 arch/powerpc/platforms/pseries/eeh_pe.c
 delete mode 100644 arch/powerpc/platforms/pseries/eeh_sysfs.c

diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
index f960a79..5826906 100644
--- a/arch/powerpc/kernel/Makefile
+++ b/arch/powerpc/kernel/Makefile
@@ -58,6 +58,8 @@ obj-$(CONFIG_RTAS_PROC)		+= rtas-proc.o
 obj-$(CONFIG_LPARCFG)		+= lparcfg.o
 obj-$(CONFIG_IBMVIO)		+= vio.o
 obj-$(CONFIG_IBMEBUS)           += ibmebus.o
+obj-$(CONFIG_EEH)		+= eeh.o eeh_pe.o eeh_dev.o eeh_cache.o \
+				   eeh_driver.o eeh_event.o eeh_sysfs.o
 obj-$(CONFIG_GENERIC_TBSYNC)	+= smp-tbsync.o
 obj-$(CONFIG_CRASH_DUMP)	+= crash_dump.o
 obj-$(CONFIG_FA_DUMP)		+= fadump.o
@@ -100,7 +102,7 @@ obj-$(CONFIG_PPC_UDBG_16550)	+= legacy_serial.o udbg_16550.o
 obj-$(CONFIG_STACKTRACE)	+= stacktrace.o
 obj-$(CONFIG_SWIOTLB)		+= dma-swiotlb.o
 
-pci64-$(CONFIG_PPC64)		+= pci_dn.o isa-bridge.o
+pci64-$(CONFIG_PPC64)		+= pci_hotplug.o pci_dn.o isa-bridge.o
 obj-$(CONFIG_PCI)		+= pci_$(CONFIG_WORD_SIZE).o $(pci64-y) \
 				   pci-common.o pci_of_scan.o
 obj-$(CONFIG_PCI_MSI)		+= msi.o
diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
new file mode 100644
index 0000000..6b73d6c
--- /dev/null
+++ b/arch/powerpc/kernel/eeh.c
@@ -0,0 +1,942 @@
+/*
+ * Copyright IBM Corporation 2001, 2005, 2006
+ * Copyright Dave Engebretsen & Todd Inglett 2001
+ * Copyright Linas Vepstas 2005, 2006
+ * Copyright 2001-2012 IBM Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307 USA
+ *
+ * Please address comments and feedback to Linas Vepstas <linas@austin.ibm.com>
+ */
+
+#include <linux/delay.h>
+#include <linux/sched.h>
+#include <linux/init.h>
+#include <linux/list.h>
+#include <linux/pci.h>
+#include <linux/proc_fs.h>
+#include <linux/rbtree.h>
+#include <linux/seq_file.h>
+#include <linux/spinlock.h>
+#include <linux/export.h>
+#include <linux/of.h>
+
+#include <linux/atomic.h>
+#include <asm/eeh.h>
+#include <asm/eeh_event.h>
+#include <asm/io.h>
+#include <asm/machdep.h>
+#include <asm/ppc-pci.h>
+#include <asm/rtas.h>
+
+
+/** Overview:
+ *  EEH, or "Extended Error Handling" is a PCI bridge technology for
+ *  dealing with PCI bus errors that can't be dealt with within the
+ *  usual PCI framework, except by check-stopping the CPU.  Systems
+ *  that are designed for high-availability/reliability cannot afford
+ *  to crash due to a "mere" PCI error, thus the need for EEH.
+ *  An EEH-capable bridge operates by converting a detected error
+ *  into a "slot freeze", taking the PCI adapter off-line, making
+ *  the slot behave, from the OS'es point of view, as if the slot
+ *  were "empty": all reads return 0xff's and all writes are silently
+ *  ignored.  EEH slot isolation events can be triggered by parity
+ *  errors on the address or data busses (e.g. during posted writes),
+ *  which in turn might be caused by low voltage on the bus, dust,
+ *  vibration, humidity, radioactivity or plain-old failed hardware.
+ *
+ *  Note, however, that one of the leading causes of EEH slot
+ *  freeze events are buggy device drivers, buggy device microcode,
+ *  or buggy device hardware.  This is because any attempt by the
+ *  device to bus-master data to a memory address that is not
+ *  assigned to the device will trigger a slot freeze.   (The idea
+ *  is to prevent devices-gone-wild from corrupting system memory).
+ *  Buggy hardware/drivers will have a miserable time co-existing
+ *  with EEH.
+ *
+ *  Ideally, a PCI device driver, when suspecting that an isolation
+ *  event has occurred (e.g. by reading 0xff's), will then ask EEH
+ *  whether this is the case, and then take appropriate steps to
+ *  reset the PCI slot, the PCI device, and then resume operations.
+ *  However, until that day,  the checking is done here, with the
+ *  eeh_check_failure() routine embedded in the MMIO macros.  If
+ *  the slot is found to be isolated, an "EEH Event" is synthesized
+ *  and sent out for processing.
+ */
+
+/* If a device driver keeps reading an MMIO register in an interrupt
+ * handler after a slot isolation event, it might be broken.
+ * This sets the threshold for how many read attempts we allow
+ * before printing an error message.
+ */
+#define EEH_MAX_FAILS	2100000
+
+/* Time to wait for a PCI slot to report status, in milliseconds */
+#define PCI_BUS_RESET_WAIT_MSEC (60*1000)
+
+/* Platform dependent EEH operations */
+struct eeh_ops *eeh_ops = NULL;
+
+int eeh_subsystem_enabled;
+EXPORT_SYMBOL(eeh_subsystem_enabled);
+
+/*
+ * EEH probe mode support. The intention is to support multiple
+ * platforms for EEH. Some platforms like pSeries do PCI emunation
+ * based on device tree. However, other platforms like powernv probe
+ * PCI devices from hardware. The flag is used to distinguish that.
+ * In addition, struct eeh_ops::probe would be invoked for particular
+ * OF node or PCI device so that the corresponding PE would be created
+ * there.
+ */
+int eeh_probe_mode;
+
+/* Global EEH mutex */
+DEFINE_MUTEX(eeh_mutex);
+
+/* Lock to avoid races due to multiple reports of an error */
+static DEFINE_RAW_SPINLOCK(confirm_error_lock);
+
+/* Buffer for reporting pci register dumps. Its here in BSS, and
+ * not dynamically alloced, so that it ends up in RMO where RTAS
+ * can access it.
+ */
+#define EEH_PCI_REGS_LOG_LEN 4096
+static unsigned char pci_regs_buf[EEH_PCI_REGS_LOG_LEN];
+
+/*
+ * The struct is used to maintain the EEH global statistic
+ * information. Besides, the EEH global statistics will be
+ * exported to user space through procfs
+ */
+struct eeh_stats {
+	u64 no_device;		/* PCI device not found		*/
+	u64 no_dn;		/* OF node not found		*/
+	u64 no_cfg_addr;	/* Config address not found	*/
+	u64 ignored_check;	/* EEH check skipped		*/
+	u64 total_mmio_ffs;	/* Total EEH checks		*/
+	u64 false_positives;	/* Unnecessary EEH checks	*/
+	u64 slot_resets;	/* PE reset			*/
+};
+
+static struct eeh_stats eeh_stats;
+
+#define IS_BRIDGE(class_code) (((class_code)<<16) == PCI_BASE_CLASS_BRIDGE)
+
+/**
+ * eeh_gather_pci_data - Copy assorted PCI config space registers to buff
+ * @edev: device to report data for
+ * @buf: point to buffer in which to log
+ * @len: amount of room in buffer
+ *
+ * This routine captures assorted PCI configuration space data,
+ * and puts them into a buffer for RTAS error logging.
+ */
+static size_t eeh_gather_pci_data(struct eeh_dev *edev, char * buf, size_t len)
+{
+	struct device_node *dn = eeh_dev_to_of_node(edev);
+	struct pci_dev *dev = eeh_dev_to_pci_dev(edev);
+	u32 cfg;
+	int cap, i;
+	int n = 0;
+
+	n += scnprintf(buf+n, len-n, "%s\n", dn->full_name);
+	printk(KERN_WARNING "EEH: of node=%s\n", dn->full_name);
+
+	eeh_ops->read_config(dn, PCI_VENDOR_ID, 4, &cfg);
+	n += scnprintf(buf+n, len-n, "dev/vend:%08x\n", cfg);
+	printk(KERN_WARNING "EEH: PCI device/vendor: %08x\n", cfg);
+
+	eeh_ops->read_config(dn, PCI_COMMAND, 4, &cfg);
+	n += scnprintf(buf+n, len-n, "cmd/stat:%x\n", cfg);
+	printk(KERN_WARNING "EEH: PCI cmd/status register: %08x\n", cfg);
+
+	if (!dev) {
+		printk(KERN_WARNING "EEH: no PCI device for this of node\n");
+		return n;
+	}
+
+	/* Gather bridge-specific registers */
+	if (dev->class >> 16 == PCI_BASE_CLASS_BRIDGE) {
+		eeh_ops->read_config(dn, PCI_SEC_STATUS, 2, &cfg);
+		n += scnprintf(buf+n, len-n, "sec stat:%x\n", cfg);
+		printk(KERN_WARNING "EEH: Bridge secondary status: %04x\n", cfg);
+
+		eeh_ops->read_config(dn, PCI_BRIDGE_CONTROL, 2, &cfg);
+		n += scnprintf(buf+n, len-n, "brdg ctl:%x\n", cfg);
+		printk(KERN_WARNING "EEH: Bridge control: %04x\n", cfg);
+	}
+
+	/* Dump out the PCI-X command and status regs */
+	cap = pci_find_capability(dev, PCI_CAP_ID_PCIX);
+	if (cap) {
+		eeh_ops->read_config(dn, cap, 4, &cfg);
+		n += scnprintf(buf+n, len-n, "pcix-cmd:%x\n", cfg);
+		printk(KERN_WARNING "EEH: PCI-X cmd: %08x\n", cfg);
+
+		eeh_ops->read_config(dn, cap+4, 4, &cfg);
+		n += scnprintf(buf+n, len-n, "pcix-stat:%x\n", cfg);
+		printk(KERN_WARNING "EEH: PCI-X status: %08x\n", cfg);
+	}
+
+	/* If PCI-E capable, dump PCI-E cap 10, and the AER */
+	cap = pci_find_capability(dev, PCI_CAP_ID_EXP);
+	if (cap) {
+		n += scnprintf(buf+n, len-n, "pci-e cap10:\n");
+		printk(KERN_WARNING
+		       "EEH: PCI-E capabilities and status follow:\n");
+
+		for (i=0; i<=8; i++) {
+			eeh_ops->read_config(dn, cap+4*i, 4, &cfg);
+			n += scnprintf(buf+n, len-n, "%02x:%x\n", 4*i, cfg);
+			printk(KERN_WARNING "EEH: PCI-E %02x: %08x\n", i, cfg);
+		}
+
+		cap = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_ERR);
+		if (cap) {
+			n += scnprintf(buf+n, len-n, "pci-e AER:\n");
+			printk(KERN_WARNING
+			       "EEH: PCI-E AER capability register set follows:\n");
+
+			for (i=0; i<14; i++) {
+				eeh_ops->read_config(dn, cap+4*i, 4, &cfg);
+				n += scnprintf(buf+n, len-n, "%02x:%x\n", 4*i, cfg);
+				printk(KERN_WARNING "EEH: PCI-E AER %02x: %08x\n", i, cfg);
+			}
+		}
+	}
+
+	return n;
+}
+
+/**
+ * eeh_slot_error_detail - Generate combined log including driver log and error log
+ * @pe: EEH PE
+ * @severity: temporary or permanent error log
+ *
+ * This routine should be called to generate the combined log, which
+ * is comprised of driver log and error log. The driver log is figured
+ * out from the config space of the corresponding PCI device, while
+ * the error log is fetched through platform dependent function call.
+ */
+void eeh_slot_error_detail(struct eeh_pe *pe, int severity)
+{
+	size_t loglen = 0;
+	struct eeh_dev *edev;
+
+	eeh_pci_enable(pe, EEH_OPT_THAW_MMIO);
+	eeh_ops->configure_bridge(pe);
+	eeh_pe_restore_bars(pe);
+
+	pci_regs_buf[0] = 0;
+	eeh_pe_for_each_dev(pe, edev) {
+		loglen += eeh_gather_pci_data(edev, pci_regs_buf,
+				EEH_PCI_REGS_LOG_LEN);
+        }
+
+	eeh_ops->get_log(pe, severity, pci_regs_buf, loglen);
+}
+
+/**
+ * eeh_token_to_phys - Convert EEH address token to phys address
+ * @token: I/O token, should be address in the form 0xA....
+ *
+ * This routine should be called to convert virtual I/O address
+ * to physical one.
+ */
+static inline unsigned long eeh_token_to_phys(unsigned long token)
+{
+	pte_t *ptep;
+	unsigned long pa;
+
+	ptep = find_linux_pte(init_mm.pgd, token);
+	if (!ptep)
+		return token;
+	pa = pte_pfn(*ptep) << PAGE_SHIFT;
+
+	return pa | (token & (PAGE_SIZE-1));
+}
+
+/**
+ * eeh_dev_check_failure - Check if all 1's data is due to EEH slot freeze
+ * @edev: eeh device
+ *
+ * Check for an EEH failure for the given device node.  Call this
+ * routine if the result of a read was all 0xff's and you want to
+ * find out if this is due to an EEH slot freeze.  This routine
+ * will query firmware for the EEH status.
+ *
+ * Returns 0 if there has not been an EEH error; otherwise returns
+ * a non-zero value and queues up a slot isolation event notification.
+ *
+ * It is safe to call this routine in an interrupt context.
+ */
+int eeh_dev_check_failure(struct eeh_dev *edev)
+{
+	int ret;
+	unsigned long flags;
+	struct device_node *dn;
+	struct pci_dev *dev;
+	struct eeh_pe *pe;
+	int rc = 0;
+	const char *location;
+
+	eeh_stats.total_mmio_ffs++;
+
+	if (!eeh_subsystem_enabled)
+		return 0;
+
+	if (!edev) {
+		eeh_stats.no_dn++;
+		return 0;
+	}
+	dn = eeh_dev_to_of_node(edev);
+	dev = eeh_dev_to_pci_dev(edev);
+	pe = edev->pe;
+
+	/* Access to IO BARs might get this far and still not want checking. */
+	if (!pe) {
+		eeh_stats.ignored_check++;
+		pr_debug("EEH: Ignored check for %s %s\n",
+			eeh_pci_name(dev), dn->full_name);
+		return 0;
+	}
+
+	if (!pe->addr && !pe->config_addr) {
+		eeh_stats.no_cfg_addr++;
+		return 0;
+	}
+
+	/* If we already have a pending isolation event for this
+	 * slot, we know it's bad already, we don't need to check.
+	 * Do this checking under a lock; as multiple PCI devices
+	 * in one slot might report errors simultaneously, and we
+	 * only want one error recovery routine running.
+	 */
+	raw_spin_lock_irqsave(&confirm_error_lock, flags);
+	rc = 1;
+	if (pe->state & EEH_PE_ISOLATED) {
+		pe->check_count++;
+		if (pe->check_count % EEH_MAX_FAILS == 0) {
+			location = of_get_property(dn, "ibm,loc-code", NULL);
+			printk(KERN_ERR "EEH: %d reads ignored for recovering device at "
+				"location=%s driver=%s pci addr=%s\n",
+				pe->check_count, location,
+				eeh_driver_name(dev), eeh_pci_name(dev));
+			printk(KERN_ERR "EEH: Might be infinite loop in %s driver\n",
+				eeh_driver_name(dev));
+			dump_stack();
+		}
+		goto dn_unlock;
+	}
+
+	/*
+	 * Now test for an EEH failure.  This is VERY expensive.
+	 * Note that the eeh_config_addr may be a parent device
+	 * in the case of a device behind a bridge, or it may be
+	 * function zero of a multi-function device.
+	 * In any case they must share a common PHB.
+	 */
+	ret = eeh_ops->get_state(pe, NULL);
+
+	/* Note that config-io to empty slots may fail;
+	 * they are empty when they don't have children.
+	 * We will punt with the following conditions: Failure to get
+	 * PE's state, EEH not support and Permanently unavailable
+	 * state, PE is in good state.
+	 */
+	if ((ret < 0) ||
+	    (ret == EEH_STATE_NOT_SUPPORT) ||
+	    (ret & (EEH_STATE_MMIO_ACTIVE | EEH_STATE_DMA_ACTIVE)) ==
+	    (EEH_STATE_MMIO_ACTIVE | EEH_STATE_DMA_ACTIVE)) {
+		eeh_stats.false_positives++;
+		pe->false_positives++;
+		rc = 0;
+		goto dn_unlock;
+	}
+
+	eeh_stats.slot_resets++;
+ 
+	/* Avoid repeated reports of this failure, including problems
+	 * with other functions on this device, and functions under
+	 * bridges.
+	 */
+	eeh_pe_state_mark(pe, EEH_PE_ISOLATED);
+	raw_spin_unlock_irqrestore(&confirm_error_lock, flags);
+
+	eeh_send_failure_event(pe);
+
+	/* Most EEH events are due to device driver bugs.  Having
+	 * a stack trace will help the device-driver authors figure
+	 * out what happened.  So print that out.
+	 */
+	WARN(1, "EEH: failure detected\n");
+	return 1;
+
+dn_unlock:
+	raw_spin_unlock_irqrestore(&confirm_error_lock, flags);
+	return rc;
+}
+
+EXPORT_SYMBOL_GPL(eeh_dev_check_failure);
+
+/**
+ * eeh_check_failure - Check if all 1's data is due to EEH slot freeze
+ * @token: I/O token, should be address in the form 0xA....
+ * @val: value, should be all 1's (XXX why do we need this arg??)
+ *
+ * Check for an EEH failure at the given token address.  Call this
+ * routine if the result of a read was all 0xff's and you want to
+ * find out if this is due to an EEH slot freeze event.  This routine
+ * will query firmware for the EEH status.
+ *
+ * Note this routine is safe to call in an interrupt context.
+ */
+unsigned long eeh_check_failure(const volatile void __iomem *token, unsigned long val)
+{
+	unsigned long addr;
+	struct eeh_dev *edev;
+
+	/* Finding the phys addr + pci device; this is pretty quick. */
+	addr = eeh_token_to_phys((unsigned long __force) token);
+	edev = eeh_addr_cache_get_dev(addr);
+	if (!edev) {
+		eeh_stats.no_device++;
+		return val;
+	}
+
+	eeh_dev_check_failure(edev);
+
+	pci_dev_put(eeh_dev_to_pci_dev(edev));
+	return val;
+}
+
+EXPORT_SYMBOL(eeh_check_failure);
+
+
+/**
+ * eeh_pci_enable - Enable MMIO or DMA transfers for this slot
+ * @pe: EEH PE
+ *
+ * This routine should be called to reenable frozen MMIO or DMA
+ * so that it would work correctly again. It's useful while doing
+ * recovery or log collection on the indicated device.
+ */
+int eeh_pci_enable(struct eeh_pe *pe, int function)
+{
+	int rc;
+
+	rc = eeh_ops->set_option(pe, function);
+	if (rc)
+		pr_warning("%s: Unexpected state change %d on PHB#%d-PE#%x, err=%d\n",
+			__func__, function, pe->phb->global_number, pe->addr, rc);
+
+	rc = eeh_ops->wait_state(pe, PCI_BUS_RESET_WAIT_MSEC);
+	if (rc > 0 && (rc & EEH_STATE_MMIO_ENABLED) &&
+	   (function == EEH_OPT_THAW_MMIO))
+		return 0;
+
+	return rc;
+}
+
+/**
+ * pcibios_set_pcie_slot_reset - Set PCI-E reset state
+ * @dev: pci device struct
+ * @state: reset state to enter
+ *
+ * Return value:
+ * 	0 if success
+ */
+int pcibios_set_pcie_reset_state(struct pci_dev *dev, enum pcie_reset_state state)
+{
+	struct eeh_dev *edev = pci_dev_to_eeh_dev(dev);
+	struct eeh_pe *pe = edev->pe;
+
+	if (!pe) {
+		pr_err("%s: No PE found on PCI device %s\n",
+			__func__, pci_name(dev));
+		return -EINVAL;
+	}
+
+	switch (state) {
+	case pcie_deassert_reset:
+		eeh_ops->reset(pe, EEH_RESET_DEACTIVATE);
+		break;
+	case pcie_hot_reset:
+		eeh_ops->reset(pe, EEH_RESET_HOT);
+		break;
+	case pcie_warm_reset:
+		eeh_ops->reset(pe, EEH_RESET_FUNDAMENTAL);
+		break;
+	default:
+		return -EINVAL;
+	};
+
+	return 0;
+}
+
+/**
+ * eeh_set_pe_freset - Check the required reset for the indicated device
+ * @data: EEH device
+ * @flag: return value
+ *
+ * Each device might have its preferred reset type: fundamental or
+ * hot reset. The routine is used to collected the information for
+ * the indicated device and its children so that the bunch of the
+ * devices could be reset properly.
+ */
+static void *eeh_set_dev_freset(void *data, void *flag)
+{
+	struct pci_dev *dev;
+	unsigned int *freset = (unsigned int *)flag;
+	struct eeh_dev *edev = (struct eeh_dev *)data;
+
+	dev = eeh_dev_to_pci_dev(edev);
+	if (dev)
+		*freset |= dev->needs_freset;
+
+	return NULL;
+}
+
+/**
+ * eeh_reset_pe_once - Assert the pci #RST line for 1/4 second
+ * @pe: EEH PE
+ *
+ * Assert the PCI #RST line for 1/4 second.
+ */
+static void eeh_reset_pe_once(struct eeh_pe *pe)
+{
+	unsigned int freset = 0;
+
+	/* Determine type of EEH reset required for
+	 * Partitionable Endpoint, a hot-reset (1)
+	 * or a fundamental reset (3).
+	 * A fundamental reset required by any device under
+	 * Partitionable Endpoint trumps hot-reset.
+  	 */
+	eeh_pe_dev_traverse(pe, eeh_set_dev_freset, &freset);
+
+	if (freset)
+		eeh_ops->reset(pe, EEH_RESET_FUNDAMENTAL);
+	else
+		eeh_ops->reset(pe, EEH_RESET_HOT);
+
+	/* The PCI bus requires that the reset be held high for at least
+	 * a 100 milliseconds. We wait a bit longer 'just in case'.
+	 */
+#define PCI_BUS_RST_HOLD_TIME_MSEC 250
+	msleep(PCI_BUS_RST_HOLD_TIME_MSEC);
+	
+	/* We might get hit with another EEH freeze as soon as the 
+	 * pci slot reset line is dropped. Make sure we don't miss
+	 * these, and clear the flag now.
+	 */
+	eeh_pe_state_clear(pe, EEH_PE_ISOLATED);
+
+	eeh_ops->reset(pe, EEH_RESET_DEACTIVATE);
+
+	/* After a PCI slot has been reset, the PCI Express spec requires
+	 * a 1.5 second idle time for the bus to stabilize, before starting
+	 * up traffic.
+	 */
+#define PCI_BUS_SETTLE_TIME_MSEC 1800
+	msleep(PCI_BUS_SETTLE_TIME_MSEC);
+}
+
+/**
+ * eeh_reset_pe - Reset the indicated PE
+ * @pe: EEH PE
+ *
+ * This routine should be called to reset indicated device, including
+ * PE. A PE might include multiple PCI devices and sometimes PCI bridges
+ * might be involved as well.
+ */
+int eeh_reset_pe(struct eeh_pe *pe)
+{
+	int i, rc;
+
+	/* Take three shots at resetting the bus */
+	for (i=0; i<3; i++) {
+		eeh_reset_pe_once(pe);
+
+		rc = eeh_ops->wait_state(pe, PCI_BUS_RESET_WAIT_MSEC);
+		if (rc == (EEH_STATE_MMIO_ACTIVE | EEH_STATE_DMA_ACTIVE))
+			return 0;
+
+		if (rc < 0) {
+			pr_err("%s: Unrecoverable slot failure on PHB#%d-PE#%x",
+				__func__, pe->phb->global_number, pe->addr);
+			return -1;
+		}
+		pr_err("EEH: bus reset %d failed on PHB#%d-PE#%x, rc=%d\n",
+			i+1, pe->phb->global_number, pe->addr, rc);
+	}
+
+	return -1;
+}
+
+/**
+ * eeh_save_bars - Save device bars
+ * @edev: PCI device associated EEH device
+ *
+ * Save the values of the device bars. Unlike the restore
+ * routine, this routine is *not* recursive. This is because
+ * PCI devices are added individually; but, for the restore,
+ * an entire slot is reset at a time.
+ */
+void eeh_save_bars(struct eeh_dev *edev)
+{
+	int i;
+	struct device_node *dn;
+
+	if (!edev)
+		return;
+	dn = eeh_dev_to_of_node(edev);
+	
+	for (i = 0; i < 16; i++)
+		eeh_ops->read_config(dn, i * 4, 4, &edev->config_space[i]);
+}
+
+/**
+ * eeh_ops_register - Register platform dependent EEH operations
+ * @ops: platform dependent EEH operations
+ *
+ * Register the platform dependent EEH operation callback
+ * functions. The platform should call this function before
+ * any other EEH operations.
+ */
+int __init eeh_ops_register(struct eeh_ops *ops)
+{
+	if (!ops->name) {
+		pr_warning("%s: Invalid EEH ops name for %p\n",
+			__func__, ops);
+		return -EINVAL;
+	}
+
+	if (eeh_ops && eeh_ops != ops) {
+		pr_warning("%s: EEH ops of platform %s already existing (%s)\n",
+			__func__, eeh_ops->name, ops->name);
+		return -EEXIST;
+	}
+
+	eeh_ops = ops;
+
+	return 0;
+}
+
+/**
+ * eeh_ops_unregister - Unreigster platform dependent EEH operations
+ * @name: name of EEH platform operations
+ *
+ * Unregister the platform dependent EEH operation callback
+ * functions.
+ */
+int __exit eeh_ops_unregister(const char *name)
+{
+	if (!name || !strlen(name)) {
+		pr_warning("%s: Invalid EEH ops name\n",
+			__func__);
+		return -EINVAL;
+	}
+
+	if (eeh_ops && !strcmp(eeh_ops->name, name)) {
+		eeh_ops = NULL;
+		return 0;
+	}
+
+	return -EEXIST;
+}
+
+/**
+ * eeh_init - EEH initialization
+ *
+ * Initialize EEH by trying to enable it for all of the adapters in the system.
+ * As a side effect we can determine here if eeh is supported at all.
+ * Note that we leave EEH on so failed config cycles won't cause a machine
+ * check.  If a user turns off EEH for a particular adapter they are really
+ * telling Linux to ignore errors.  Some hardware (e.g. POWER5) won't
+ * grant access to a slot if EEH isn't enabled, and so we always enable
+ * EEH for all slots/all devices.
+ *
+ * The eeh-force-off option disables EEH checking globally, for all slots.
+ * Even if force-off is set, the EEH hardware is still enabled, so that
+ * newer systems can boot.
+ */
+static int __init eeh_init(void)
+{
+	struct pci_controller *hose, *tmp;
+	struct device_node *phb;
+	int ret;
+
+	/* call platform initialization function */
+	if (!eeh_ops) {
+		pr_warning("%s: Platform EEH operation not found\n",
+			__func__);
+		return -EEXIST;
+	} else if ((ret = eeh_ops->init())) {
+		pr_warning("%s: Failed to call platform init function (%d)\n",
+			__func__, ret);
+		return ret;
+	}
+
+	raw_spin_lock_init(&confirm_error_lock);
+
+	/* Enable EEH for all adapters */
+	if (eeh_probe_mode_devtree()) {
+		list_for_each_entry_safe(hose, tmp,
+			&hose_list, list_node) {
+			phb = hose->dn;
+			traverse_pci_devices(phb, eeh_ops->of_probe, NULL);
+		}
+	}
+
+	if (eeh_subsystem_enabled)
+		pr_info("EEH: PCI Enhanced I/O Error Handling Enabled\n");
+	else
+		pr_warning("EEH: No capable adapters found\n");
+
+	return ret;
+}
+
+core_initcall_sync(eeh_init);
+
+/**
+ * eeh_add_device_early - Enable EEH for the indicated device_node
+ * @dn: device node for which to set up EEH
+ *
+ * This routine must be used to perform EEH initialization for PCI
+ * devices that were added after system boot (e.g. hotplug, dlpar).
+ * This routine must be called before any i/o is performed to the
+ * adapter (inluding any config-space i/o).
+ * Whether this actually enables EEH or not for this device depends
+ * on the CEC architecture, type of the device, on earlier boot
+ * command-line arguments & etc.
+ */
+static void eeh_add_device_early(struct device_node *dn)
+{
+	struct pci_controller *phb;
+
+	if (!of_node_to_eeh_dev(dn))
+		return;
+	phb = of_node_to_eeh_dev(dn)->phb;
+
+	/* USB Bus children of PCI devices will not have BUID's */
+	if (NULL == phb || 0 == phb->buid)
+		return;
+
+	/* FIXME: hotplug support on POWERNV */
+	eeh_ops->of_probe(dn, NULL);
+}
+
+/**
+ * eeh_add_device_tree_early - Enable EEH for the indicated device
+ * @dn: device node
+ *
+ * This routine must be used to perform EEH initialization for the
+ * indicated PCI device that was added after system boot (e.g.
+ * hotplug, dlpar).
+ */
+void eeh_add_device_tree_early(struct device_node *dn)
+{
+	struct device_node *sib;
+
+	for_each_child_of_node(dn, sib)
+		eeh_add_device_tree_early(sib);
+	eeh_add_device_early(dn);
+}
+EXPORT_SYMBOL_GPL(eeh_add_device_tree_early);
+
+/**
+ * eeh_add_device_late - Perform EEH initialization for the indicated pci device
+ * @dev: pci device for which to set up EEH
+ *
+ * This routine must be used to complete EEH initialization for PCI
+ * devices that were added after system boot (e.g. hotplug, dlpar).
+ */
+static void eeh_add_device_late(struct pci_dev *dev)
+{
+	struct device_node *dn;
+	struct eeh_dev *edev;
+
+	if (!dev || !eeh_subsystem_enabled)
+		return;
+
+	pr_debug("EEH: Adding device %s\n", pci_name(dev));
+
+	dn = pci_device_to_OF_node(dev);
+	edev = of_node_to_eeh_dev(dn);
+	if (edev->pdev == dev) {
+		pr_debug("EEH: Already referenced !\n");
+		return;
+	}
+	WARN_ON(edev->pdev);
+
+	pci_dev_get(dev);
+	edev->pdev = dev;
+	dev->dev.archdata.edev = edev;
+
+	eeh_addr_cache_insert_dev(dev);
+}
+
+/**
+ * eeh_add_device_tree_late - Perform EEH initialization for the indicated PCI bus
+ * @bus: PCI bus
+ *
+ * This routine must be used to perform EEH initialization for PCI
+ * devices which are attached to the indicated PCI bus. The PCI bus
+ * is added after system boot through hotplug or dlpar.
+ */
+void eeh_add_device_tree_late(struct pci_bus *bus)
+{
+	struct pci_dev *dev;
+
+	list_for_each_entry(dev, &bus->devices, bus_list) {
+ 		eeh_add_device_late(dev);
+ 		if (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE) {
+ 			struct pci_bus *subbus = dev->subordinate;
+ 			if (subbus)
+ 				eeh_add_device_tree_late(subbus);
+ 		}
+	}
+}
+EXPORT_SYMBOL_GPL(eeh_add_device_tree_late);
+
+/**
+ * eeh_add_sysfs_files - Add EEH sysfs files for the indicated PCI bus
+ * @bus: PCI bus
+ *
+ * This routine must be used to add EEH sysfs files for PCI
+ * devices which are attached to the indicated PCI bus. The PCI bus
+ * is added after system boot through hotplug or dlpar.
+ */
+void eeh_add_sysfs_files(struct pci_bus *bus)
+{
+	struct pci_dev *dev;
+
+	list_for_each_entry(dev, &bus->devices, bus_list) {
+		eeh_sysfs_add_device(dev);
+		if (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE) {
+			struct pci_bus *subbus = dev->subordinate;
+			if (subbus)
+				eeh_add_sysfs_files(subbus);
+		}
+	}
+}
+EXPORT_SYMBOL_GPL(eeh_add_sysfs_files);
+
+/**
+ * eeh_remove_device - Undo EEH setup for the indicated pci device
+ * @dev: pci device to be removed
+ * @purge_pe: remove the PE or not
+ *
+ * This routine should be called when a device is removed from
+ * a running system (e.g. by hotplug or dlpar).  It unregisters
+ * the PCI device from the EEH subsystem.  I/O errors affecting
+ * this device will no longer be detected after this call; thus,
+ * i/o errors affecting this slot may leave this device unusable.
+ */
+static void eeh_remove_device(struct pci_dev *dev, int purge_pe)
+{
+	struct eeh_dev *edev;
+
+	if (!dev || !eeh_subsystem_enabled)
+		return;
+	edev = pci_dev_to_eeh_dev(dev);
+
+	/* Unregister the device with the EEH/PCI address search system */
+	pr_debug("EEH: Removing device %s\n", pci_name(dev));
+
+	if (!edev || !edev->pdev) {
+		pr_debug("EEH: Not referenced !\n");
+		return;
+	}
+	edev->pdev = NULL;
+	dev->dev.archdata.edev = NULL;
+	pci_dev_put(dev);
+
+	eeh_rmv_from_parent_pe(edev, purge_pe);
+	eeh_addr_cache_rmv_dev(dev);
+	eeh_sysfs_remove_device(dev);
+}
+
+/**
+ * eeh_remove_bus_device - Undo EEH setup for the indicated PCI device
+ * @dev: PCI device
+ * @purge_pe: remove the corresponding PE or not
+ *
+ * This routine must be called when a device is removed from the
+ * running system through hotplug or dlpar. The corresponding
+ * PCI address cache will be removed.
+ */
+void eeh_remove_bus_device(struct pci_dev *dev, int purge_pe)
+{
+	struct pci_bus *bus = dev->subordinate;
+	struct pci_dev *child, *tmp;
+
+	eeh_remove_device(dev, purge_pe);
+
+	if (bus && dev->hdr_type == PCI_HEADER_TYPE_BRIDGE) {
+		list_for_each_entry_safe(child, tmp, &bus->devices, bus_list)
+			 eeh_remove_bus_device(child, purge_pe);
+	}
+}
+EXPORT_SYMBOL_GPL(eeh_remove_bus_device);
+
+static int proc_eeh_show(struct seq_file *m, void *v)
+{
+	if (0 == eeh_subsystem_enabled) {
+		seq_printf(m, "EEH Subsystem is globally disabled\n");
+		seq_printf(m, "eeh_total_mmio_ffs=%llu\n", eeh_stats.total_mmio_ffs);
+	} else {
+		seq_printf(m, "EEH Subsystem is enabled\n");
+		seq_printf(m,
+				"no device=%llu\n"
+				"no device node=%llu\n"
+				"no config address=%llu\n"
+				"check not wanted=%llu\n"
+				"eeh_total_mmio_ffs=%llu\n"
+				"eeh_false_positives=%llu\n"
+				"eeh_slot_resets=%llu\n",
+				eeh_stats.no_device,
+				eeh_stats.no_dn,
+				eeh_stats.no_cfg_addr,
+				eeh_stats.ignored_check,
+				eeh_stats.total_mmio_ffs,
+				eeh_stats.false_positives,
+				eeh_stats.slot_resets);
+	}
+
+	return 0;
+}
+
+static int proc_eeh_open(struct inode *inode, struct file *file)
+{
+	return single_open(file, proc_eeh_show, NULL);
+}
+
+static const struct file_operations proc_eeh_operations = {
+	.open      = proc_eeh_open,
+	.read      = seq_read,
+	.llseek    = seq_lseek,
+	.release   = single_release,
+};
+
+static int __init eeh_init_proc(void)
+{
+	if (machine_is(pseries))
+		proc_create("powerpc/eeh", 0, NULL, &proc_eeh_operations);
+	return 0;
+}
+__initcall(eeh_init_proc);
diff --git a/arch/powerpc/kernel/eeh_cache.c b/arch/powerpc/kernel/eeh_cache.c
new file mode 100644
index 0000000..5a4c879
--- /dev/null
+++ b/arch/powerpc/kernel/eeh_cache.c
@@ -0,0 +1,319 @@
+/*
+ * PCI address cache; allows the lookup of PCI devices based on I/O address
+ *
+ * Copyright IBM Corporation 2004
+ * Copyright Linas Vepstas <linas@austin.ibm.com> 2004
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307 USA
+ */
+
+#include <linux/list.h>
+#include <linux/pci.h>
+#include <linux/rbtree.h>
+#include <linux/slab.h>
+#include <linux/spinlock.h>
+#include <linux/atomic.h>
+#include <asm/pci-bridge.h>
+#include <asm/ppc-pci.h>
+
+
+/**
+ * The pci address cache subsystem.  This subsystem places
+ * PCI device address resources into a red-black tree, sorted
+ * according to the address range, so that given only an i/o
+ * address, the corresponding PCI device can be **quickly**
+ * found. It is safe to perform an address lookup in an interrupt
+ * context; this ability is an important feature.
+ *
+ * Currently, the only customer of this code is the EEH subsystem;
+ * thus, this code has been somewhat tailored to suit EEH better.
+ * In particular, the cache does *not* hold the addresses of devices
+ * for which EEH is not enabled.
+ *
+ * (Implementation Note: The RB tree seems to be better/faster
+ * than any hash algo I could think of for this problem, even
+ * with the penalty of slow pointer chases for d-cache misses).
+ */
+struct pci_io_addr_range {
+	struct rb_node rb_node;
+	unsigned long addr_lo;
+	unsigned long addr_hi;
+	struct eeh_dev *edev;
+	struct pci_dev *pcidev;
+	unsigned int flags;
+};
+
+static struct pci_io_addr_cache {
+	struct rb_root rb_root;
+	spinlock_t piar_lock;
+} pci_io_addr_cache_root;
+
+static inline struct eeh_dev *__eeh_addr_cache_get_device(unsigned long addr)
+{
+	struct rb_node *n = pci_io_addr_cache_root.rb_root.rb_node;
+
+	while (n) {
+		struct pci_io_addr_range *piar;
+		piar = rb_entry(n, struct pci_io_addr_range, rb_node);
+
+		if (addr < piar->addr_lo) {
+			n = n->rb_left;
+		} else {
+			if (addr > piar->addr_hi) {
+				n = n->rb_right;
+			} else {
+				pci_dev_get(piar->pcidev);
+				return piar->edev;
+			}
+		}
+	}
+
+	return NULL;
+}
+
+/**
+ * eeh_addr_cache_get_dev - Get device, given only address
+ * @addr: mmio (PIO) phys address or i/o port number
+ *
+ * Given an mmio phys address, or a port number, find a pci device
+ * that implements this address.  Be sure to pci_dev_put the device
+ * when finished.  I/O port numbers are assumed to be offset
+ * from zero (that is, they do *not* have pci_io_addr added in).
+ * It is safe to call this function within an interrupt.
+ */
+struct eeh_dev *eeh_addr_cache_get_dev(unsigned long addr)
+{
+	struct eeh_dev *edev;
+	unsigned long flags;
+
+	spin_lock_irqsave(&pci_io_addr_cache_root.piar_lock, flags);
+	edev = __eeh_addr_cache_get_device(addr);
+	spin_unlock_irqrestore(&pci_io_addr_cache_root.piar_lock, flags);
+	return edev;
+}
+
+#ifdef DEBUG
+/*
+ * Handy-dandy debug print routine, does nothing more
+ * than print out the contents of our addr cache.
+ */
+static void eeh_addr_cache_print(struct pci_io_addr_cache *cache)
+{
+	struct rb_node *n;
+	int cnt = 0;
+
+	n = rb_first(&cache->rb_root);
+	while (n) {
+		struct pci_io_addr_range *piar;
+		piar = rb_entry(n, struct pci_io_addr_range, rb_node);
+		pr_debug("PCI: %s addr range %d [%lx-%lx]: %s\n",
+		       (piar->flags & IORESOURCE_IO) ? "i/o" : "mem", cnt,
+		       piar->addr_lo, piar->addr_hi, pci_name(piar->pcidev));
+		cnt++;
+		n = rb_next(n);
+	}
+}
+#endif
+
+/* Insert address range into the rb tree. */
+static struct pci_io_addr_range *
+eeh_addr_cache_insert(struct pci_dev *dev, unsigned long alo,
+		      unsigned long ahi, unsigned int flags)
+{
+	struct rb_node **p = &pci_io_addr_cache_root.rb_root.rb_node;
+	struct rb_node *parent = NULL;
+	struct pci_io_addr_range *piar;
+
+	/* Walk tree, find a place to insert into tree */
+	while (*p) {
+		parent = *p;
+		piar = rb_entry(parent, struct pci_io_addr_range, rb_node);
+		if (ahi < piar->addr_lo) {
+			p = &parent->rb_left;
+		} else if (alo > piar->addr_hi) {
+			p = &parent->rb_right;
+		} else {
+			if (dev != piar->pcidev ||
+			    alo != piar->addr_lo || ahi != piar->addr_hi) {
+				pr_warning("PIAR: overlapping address range\n");
+			}
+			return piar;
+		}
+	}
+	piar = kzalloc(sizeof(struct pci_io_addr_range), GFP_ATOMIC);
+	if (!piar)
+		return NULL;
+
+	pci_dev_get(dev);
+	piar->addr_lo = alo;
+	piar->addr_hi = ahi;
+	piar->edev = pci_dev_to_eeh_dev(dev);
+	piar->pcidev = dev;
+	piar->flags = flags;
+
+#ifdef DEBUG
+	pr_debug("PIAR: insert range=[%lx:%lx] dev=%s\n",
+	                  alo, ahi, pci_name(dev));
+#endif
+
+	rb_link_node(&piar->rb_node, parent, p);
+	rb_insert_color(&piar->rb_node, &pci_io_addr_cache_root.rb_root);
+
+	return piar;
+}
+
+static void __eeh_addr_cache_insert_dev(struct pci_dev *dev)
+{
+	struct device_node *dn;
+	struct eeh_dev *edev;
+	int i;
+
+	dn = pci_device_to_OF_node(dev);
+	if (!dn) {
+		pr_warning("PCI: no pci dn found for dev=%s\n", pci_name(dev));
+		return;
+	}
+
+	edev = of_node_to_eeh_dev(dn);
+	if (!edev) {
+		pr_warning("PCI: no EEH dev found for dn=%s\n",
+			dn->full_name);
+		return;
+	}
+
+	/* Skip any devices for which EEH is not enabled. */
+	if (!edev->pe) {
+#ifdef DEBUG
+		pr_info("PCI: skip building address cache for=%s - %s\n",
+			pci_name(dev), dn->full_name);
+#endif
+		return;
+	}
+
+	/* Walk resources on this device, poke them into the tree */
+	for (i = 0; i < DEVICE_COUNT_RESOURCE; i++) {
+		unsigned long start = pci_resource_start(dev,i);
+		unsigned long end = pci_resource_end(dev,i);
+		unsigned int flags = pci_resource_flags(dev,i);
+
+		/* We are interested only bus addresses, not dma or other stuff */
+		if (0 == (flags & (IORESOURCE_IO | IORESOURCE_MEM)))
+			continue;
+		if (start == 0 || ~start == 0 || end == 0 || ~end == 0)
+			 continue;
+		eeh_addr_cache_insert(dev, start, end, flags);
+	}
+}
+
+/**
+ * eeh_addr_cache_insert_dev - Add a device to the address cache
+ * @dev: PCI device whose I/O addresses we are interested in.
+ *
+ * In order to support the fast lookup of devices based on addresses,
+ * we maintain a cache of devices that can be quickly searched.
+ * This routine adds a device to that cache.
+ */
+void eeh_addr_cache_insert_dev(struct pci_dev *dev)
+{
+	unsigned long flags;
+
+	/* Ignore PCI bridges */
+	if ((dev->class >> 16) == PCI_BASE_CLASS_BRIDGE)
+		return;
+
+	spin_lock_irqsave(&pci_io_addr_cache_root.piar_lock, flags);
+	__eeh_addr_cache_insert_dev(dev);
+	spin_unlock_irqrestore(&pci_io_addr_cache_root.piar_lock, flags);
+}
+
+static inline void __eeh_addr_cache_rmv_dev(struct pci_dev *dev)
+{
+	struct rb_node *n;
+
+restart:
+	n = rb_first(&pci_io_addr_cache_root.rb_root);
+	while (n) {
+		struct pci_io_addr_range *piar;
+		piar = rb_entry(n, struct pci_io_addr_range, rb_node);
+
+		if (piar->pcidev == dev) {
+			rb_erase(n, &pci_io_addr_cache_root.rb_root);
+			pci_dev_put(piar->pcidev);
+			kfree(piar);
+			goto restart;
+		}
+		n = rb_next(n);
+	}
+}
+
+/**
+ * eeh_addr_cache_rmv_dev - remove pci device from addr cache
+ * @dev: device to remove
+ *
+ * Remove a device from the addr-cache tree.
+ * This is potentially expensive, since it will walk
+ * the tree multiple times (once per resource).
+ * But so what; device removal doesn't need to be that fast.
+ */
+void eeh_addr_cache_rmv_dev(struct pci_dev *dev)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&pci_io_addr_cache_root.piar_lock, flags);
+	__eeh_addr_cache_rmv_dev(dev);
+	spin_unlock_irqrestore(&pci_io_addr_cache_root.piar_lock, flags);
+}
+
+/**
+ * eeh_addr_cache_build - Build a cache of I/O addresses
+ *
+ * Build a cache of pci i/o addresses.  This cache will be used to
+ * find the pci device that corresponds to a given address.
+ * This routine scans all pci busses to build the cache.
+ * Must be run late in boot process, after the pci controllers
+ * have been scanned for devices (after all device resources are known).
+ */
+void __init eeh_addr_cache_build(void)
+{
+	struct device_node *dn;
+	struct eeh_dev *edev;
+	struct pci_dev *dev = NULL;
+
+	spin_lock_init(&pci_io_addr_cache_root.piar_lock);
+
+	for_each_pci_dev(dev) {
+		eeh_addr_cache_insert_dev(dev);
+
+		dn = pci_device_to_OF_node(dev);
+		if (!dn)
+			continue;
+
+		edev = of_node_to_eeh_dev(dn);
+		if (!edev)
+			continue;
+
+		pci_dev_get(dev);  /* matching put is in eeh_remove_device() */
+		dev->dev.archdata.edev = edev;
+		edev->pdev = dev;
+
+		eeh_sysfs_add_device(dev);
+	}
+
+#ifdef DEBUG
+	/* Verify tree built up above, echo back the list of addrs. */
+	eeh_addr_cache_print(&pci_io_addr_cache_root);
+#endif
+}
+
diff --git a/arch/powerpc/kernel/eeh_dev.c b/arch/powerpc/kernel/eeh_dev.c
new file mode 100644
index 0000000..1efa28f
--- /dev/null
+++ b/arch/powerpc/kernel/eeh_dev.c
@@ -0,0 +1,112 @@
+/*
+ * The file intends to implement dynamic creation of EEH device, which will
+ * be bound with OF node and PCI device simutaneously. The EEH devices would
+ * be foundamental information for EEH core components to work proerly. Besides,
+ * We have to support multiple situations where dynamic creation of EEH device
+ * is required:
+ *
+ * 1) Before PCI emunation starts, we need create EEH devices according to the
+ *    PCI sensitive OF nodes.
+ * 2) When PCI emunation is done, we need do the binding between PCI device and
+ *    the associated EEH device.
+ * 3) DR (Dynamic Reconfiguration) would create PCI sensitive OF node. EEH device
+ *    will be created while PCI sensitive OF node is detected from DR.
+ * 4) PCI hotplug needs redoing the binding between PCI device and EEH device. If
+ *    PHB is newly inserted, we also need create EEH devices accordingly.
+ *
+ * Copyright Benjamin Herrenschmidt & Gavin Shan, IBM Corporation 2012.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307 USA
+ */
+
+#include <linux/export.h>
+#include <linux/gfp.h>
+#include <linux/init.h>
+#include <linux/kernel.h>
+#include <linux/pci.h>
+#include <linux/string.h>
+
+#include <asm/pci-bridge.h>
+#include <asm/ppc-pci.h>
+
+/**
+ * eeh_dev_init - Create EEH device according to OF node
+ * @dn: device node
+ * @data: PHB
+ *
+ * It will create EEH device according to the given OF node. The function
+ * might be called by PCI emunation, DR, PHB hotplug.
+ */
+void *eeh_dev_init(struct device_node *dn, void *data)
+{
+	struct pci_controller *phb = data;
+	struct eeh_dev *edev;
+
+	/* Allocate EEH device */
+	edev = kzalloc(sizeof(*edev), GFP_KERNEL);
+	if (!edev) {
+		pr_warning("%s: out of memory\n", __func__);
+		return NULL;
+	}
+
+	/* Associate EEH device with OF node */
+	PCI_DN(dn)->edev = edev;
+	edev->dn  = dn;
+	edev->phb = phb;
+	INIT_LIST_HEAD(&edev->list);
+
+	return NULL;
+}
+
+/**
+ * eeh_dev_phb_init_dynamic - Create EEH devices for devices included in PHB
+ * @phb: PHB
+ *
+ * Scan the PHB OF node and its child association, then create the
+ * EEH devices accordingly
+ */
+void eeh_dev_phb_init_dynamic(struct pci_controller *phb)
+{
+	struct device_node *dn = phb->dn;
+
+	/* EEH PE for PHB */
+	eeh_phb_pe_create(phb);
+
+	/* EEH device for PHB */
+	eeh_dev_init(dn, phb);
+
+	/* EEH devices for children OF nodes */
+	traverse_pci_devices(dn, eeh_dev_init, phb);
+}
+
+/**
+ * eeh_dev_phb_init - Create EEH devices for devices included in existing PHBs
+ *
+ * Scan all the existing PHBs and create EEH devices for their OF
+ * nodes and their children OF nodes
+ */
+static int __init eeh_dev_phb_init(void)
+{
+	struct pci_controller *phb, *tmp;
+
+	list_for_each_entry_safe(phb, tmp, &hose_list, list_node)
+		eeh_dev_phb_init_dynamic(phb);
+
+	pr_info("EEH: devices created\n");
+
+	return 0;
+}
+
+core_initcall(eeh_dev_phb_init);
diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c
new file mode 100644
index 0000000..a3fefb6
--- /dev/null
+++ b/arch/powerpc/kernel/eeh_driver.c
@@ -0,0 +1,552 @@
+/*
+ * PCI Error Recovery Driver for RPA-compliant PPC64 platform.
+ * Copyright IBM Corp. 2004 2005
+ * Copyright Linas Vepstas <linas@linas.org> 2004, 2005
+ *
+ * All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or (at
+ * your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE, GOOD TITLE or
+ * NON INFRINGEMENT.  See the GNU General Public License for more
+ * details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
+ *
+ * Send comments and feedback to Linas Vepstas <linas@austin.ibm.com>
+ */
+#include <linux/delay.h>
+#include <linux/interrupt.h>
+#include <linux/irq.h>
+#include <linux/module.h>
+#include <linux/pci.h>
+#include <asm/eeh.h>
+#include <asm/eeh_event.h>
+#include <asm/ppc-pci.h>
+#include <asm/pci-bridge.h>
+#include <asm/prom.h>
+#include <asm/rtas.h>
+
+/**
+ * eeh_pcid_name - Retrieve name of PCI device driver
+ * @pdev: PCI device
+ *
+ * This routine is used to retrieve the name of PCI device driver
+ * if that's valid.
+ */
+static inline const char *eeh_pcid_name(struct pci_dev *pdev)
+{
+	if (pdev && pdev->dev.driver)
+		return pdev->dev.driver->name;
+	return "";
+}
+
+/**
+ * eeh_pcid_get - Get the PCI device driver
+ * @pdev: PCI device
+ *
+ * The function is used to retrieve the PCI device driver for
+ * the indicated PCI device. Besides, we will increase the reference
+ * of the PCI device driver to prevent that being unloaded on
+ * the fly. Otherwise, kernel crash would be seen.
+ */
+static inline struct pci_driver *eeh_pcid_get(struct pci_dev *pdev)
+{
+	if (!pdev || !pdev->driver)
+		return NULL;
+
+	if (!try_module_get(pdev->driver->driver.owner))
+		return NULL;
+
+	return pdev->driver;
+}
+
+/**
+ * eeh_pcid_put - Dereference on the PCI device driver
+ * @pdev: PCI device
+ *
+ * The function is called to do dereference on the PCI device
+ * driver of the indicated PCI device.
+ */
+static inline void eeh_pcid_put(struct pci_dev *pdev)
+{
+	if (!pdev || !pdev->driver)
+		return;
+
+	module_put(pdev->driver->driver.owner);
+}
+
+#if 0
+static void print_device_node_tree(struct pci_dn *pdn, int dent)
+{
+	int i;
+	struct device_node *pc;
+
+	if (!pdn)
+		return;
+	for (i = 0; i < dent; i++)
+		printk(" ");
+	printk("dn=%s mode=%x \tcfg_addr=%x pe_addr=%x \tfull=%s\n",
+		pdn->node->name, pdn->eeh_mode, pdn->eeh_config_addr,
+		pdn->eeh_pe_config_addr, pdn->node->full_name);
+	dent += 3;
+	pc = pdn->node->child;
+	while (pc) {
+		print_device_node_tree(PCI_DN(pc), dent);
+		pc = pc->sibling;
+	}
+}
+#endif
+
+/**
+ * eeh_disable_irq - Disable interrupt for the recovering device
+ * @dev: PCI device
+ *
+ * This routine must be called when reporting temporary or permanent
+ * error to the particular PCI device to disable interrupt of that
+ * device. If the device has enabled MSI or MSI-X interrupt, we needn't
+ * do real work because EEH should freeze DMA transfers for those PCI
+ * devices encountering EEH errors, which includes MSI or MSI-X.
+ */
+static void eeh_disable_irq(struct pci_dev *dev)
+{
+	struct eeh_dev *edev = pci_dev_to_eeh_dev(dev);
+
+	/* Don't disable MSI and MSI-X interrupts. They are
+	 * effectively disabled by the DMA Stopped state
+	 * when an EEH error occurs.
+	 */
+	if (dev->msi_enabled || dev->msix_enabled)
+		return;
+
+	if (!irq_has_action(dev->irq))
+		return;
+
+	edev->mode |= EEH_DEV_IRQ_DISABLED;
+	disable_irq_nosync(dev->irq);
+}
+
+/**
+ * eeh_enable_irq - Enable interrupt for the recovering device
+ * @dev: PCI device
+ *
+ * This routine must be called to enable interrupt while failed
+ * device could be resumed.
+ */
+static void eeh_enable_irq(struct pci_dev *dev)
+{
+	struct eeh_dev *edev = pci_dev_to_eeh_dev(dev);
+
+	if ((edev->mode) & EEH_DEV_IRQ_DISABLED) {
+		edev->mode &= ~EEH_DEV_IRQ_DISABLED;
+		enable_irq(dev->irq);
+	}
+}
+
+/**
+ * eeh_report_error - Report pci error to each device driver
+ * @data: eeh device
+ * @userdata: return value
+ * 
+ * Report an EEH error to each device driver, collect up and 
+ * merge the device driver responses. Cumulative response 
+ * passed back in "userdata".
+ */
+static void *eeh_report_error(void *data, void *userdata)
+{
+	struct eeh_dev *edev = (struct eeh_dev *)data;
+	struct pci_dev *dev = eeh_dev_to_pci_dev(edev);
+	enum pci_ers_result rc, *res = userdata;
+	struct pci_driver *driver;
+
+	/* We might not have the associated PCI device,
+	 * then we should continue for next one.
+	 */
+	if (!dev) return NULL;
+	dev->error_state = pci_channel_io_frozen;
+
+	driver = eeh_pcid_get(dev);
+	if (!driver) return NULL;
+
+	eeh_disable_irq(dev);
+
+	if (!driver->err_handler ||
+	    !driver->err_handler->error_detected) {
+		eeh_pcid_put(dev);
+		return NULL;
+	}
+
+	rc = driver->err_handler->error_detected(dev, pci_channel_io_frozen);
+
+	/* A driver that needs a reset trumps all others */
+	if (rc == PCI_ERS_RESULT_NEED_RESET) *res = rc;
+	if (*res == PCI_ERS_RESULT_NONE) *res = rc;
+
+	eeh_pcid_put(dev);
+	return NULL;
+}
+
+/**
+ * eeh_report_mmio_enabled - Tell drivers that MMIO has been enabled
+ * @data: eeh device
+ * @userdata: return value
+ *
+ * Tells each device driver that IO ports, MMIO and config space I/O
+ * are now enabled. Collects up and merges the device driver responses.
+ * Cumulative response passed back in "userdata".
+ */
+static void *eeh_report_mmio_enabled(void *data, void *userdata)
+{
+	struct eeh_dev *edev = (struct eeh_dev *)data;
+	struct pci_dev *dev = eeh_dev_to_pci_dev(edev);
+	enum pci_ers_result rc, *res = userdata;
+	struct pci_driver *driver;
+
+	driver = eeh_pcid_get(dev);
+	if (!driver) return NULL;
+
+	if (!driver->err_handler ||
+	    !driver->err_handler->mmio_enabled) {
+		eeh_pcid_put(dev);
+		return NULL;
+	}
+
+	rc = driver->err_handler->mmio_enabled(dev);
+
+	/* A driver that needs a reset trumps all others */
+	if (rc == PCI_ERS_RESULT_NEED_RESET) *res = rc;
+	if (*res == PCI_ERS_RESULT_NONE) *res = rc;
+
+	eeh_pcid_put(dev);
+	return NULL;
+}
+
+/**
+ * eeh_report_reset - Tell device that slot has been reset
+ * @data: eeh device
+ * @userdata: return value
+ *
+ * This routine must be called while EEH tries to reset particular
+ * PCI device so that the associated PCI device driver could take
+ * some actions, usually to save data the driver needs so that the
+ * driver can work again while the device is recovered.
+ */
+static void *eeh_report_reset(void *data, void *userdata)
+{
+	struct eeh_dev *edev = (struct eeh_dev *)data;
+	struct pci_dev *dev = eeh_dev_to_pci_dev(edev);
+	enum pci_ers_result rc, *res = userdata;
+	struct pci_driver *driver;
+
+	if (!dev) return NULL;
+	dev->error_state = pci_channel_io_normal;
+
+	driver = eeh_pcid_get(dev);
+	if (!driver) return NULL;
+
+	eeh_enable_irq(dev);
+
+	if (!driver->err_handler ||
+	    !driver->err_handler->slot_reset) {
+		eeh_pcid_put(dev);
+		return NULL;
+	}
+
+	rc = driver->err_handler->slot_reset(dev);
+	if ((*res == PCI_ERS_RESULT_NONE) ||
+	    (*res == PCI_ERS_RESULT_RECOVERED)) *res = rc;
+	if (*res == PCI_ERS_RESULT_DISCONNECT &&
+	     rc == PCI_ERS_RESULT_NEED_RESET) *res = rc;
+
+	eeh_pcid_put(dev);
+	return NULL;
+}
+
+/**
+ * eeh_report_resume - Tell device to resume normal operations
+ * @data: eeh device
+ * @userdata: return value
+ *
+ * This routine must be called to notify the device driver that it
+ * could resume so that the device driver can do some initialization
+ * to make the recovered device work again.
+ */
+static void *eeh_report_resume(void *data, void *userdata)
+{
+	struct eeh_dev *edev = (struct eeh_dev *)data;
+	struct pci_dev *dev = eeh_dev_to_pci_dev(edev);
+	struct pci_driver *driver;
+
+	if (!dev) return NULL;
+	dev->error_state = pci_channel_io_normal;
+
+	driver = eeh_pcid_get(dev);
+	if (!driver) return NULL;
+
+	eeh_enable_irq(dev);
+
+	if (!driver->err_handler ||
+	    !driver->err_handler->resume) {
+		eeh_pcid_put(dev);
+		return NULL;
+	}
+
+	driver->err_handler->resume(dev);
+
+	eeh_pcid_put(dev);
+	return NULL;
+}
+
+/**
+ * eeh_report_failure - Tell device driver that device is dead.
+ * @data: eeh device
+ * @userdata: return value
+ *
+ * This informs the device driver that the device is permanently
+ * dead, and that no further recovery attempts will be made on it.
+ */
+static void *eeh_report_failure(void *data, void *userdata)
+{
+	struct eeh_dev *edev = (struct eeh_dev *)data;
+	struct pci_dev *dev = eeh_dev_to_pci_dev(edev);
+	struct pci_driver *driver;
+
+	if (!dev) return NULL;
+	dev->error_state = pci_channel_io_perm_failure;
+
+	driver = eeh_pcid_get(dev);
+	if (!driver) return NULL;
+
+	eeh_disable_irq(dev);
+
+	if (!driver->err_handler ||
+	    !driver->err_handler->error_detected) {
+		eeh_pcid_put(dev);
+		return NULL;
+	}
+
+	driver->err_handler->error_detected(dev, pci_channel_io_perm_failure);
+
+	eeh_pcid_put(dev);
+	return NULL;
+}
+
+/**
+ * eeh_reset_device - Perform actual reset of a pci slot
+ * @pe: EEH PE
+ * @bus: PCI bus corresponding to the isolcated slot
+ *
+ * This routine must be called to do reset on the indicated PE.
+ * During the reset, udev might be invoked because those affected
+ * PCI devices will be removed and then added.
+ */
+static int eeh_reset_device(struct eeh_pe *pe, struct pci_bus *bus)
+{
+	int cnt, rc;
+
+	/* pcibios will clear the counter; save the value */
+	cnt = pe->freeze_count;
+
+	/*
+	 * We don't remove the corresponding PE instances because
+	 * we need the information afterwords. The attached EEH
+	 * devices are expected to be attached soon when calling
+	 * into pcibios_add_pci_devices().
+	 */
+	if (bus)
+		__pcibios_remove_pci_devices(bus, 0);
+
+	/* Reset the pci controller. (Asserts RST#; resets config space).
+	 * Reconfigure bridges and devices. Don't try to bring the system
+	 * up if the reset failed for some reason.
+	 */
+	rc = eeh_reset_pe(pe);
+	if (rc)
+		return rc;
+
+	/* Restore PE */
+	eeh_ops->configure_bridge(pe);
+	eeh_pe_restore_bars(pe);
+
+	/* Give the system 5 seconds to finish running the user-space
+	 * hotplug shutdown scripts, e.g. ifdown for ethernet.  Yes, 
+	 * this is a hack, but if we don't do this, and try to bring 
+	 * the device up before the scripts have taken it down, 
+	 * potentially weird things happen.
+	 */
+	if (bus) {
+		ssleep(5);
+		pcibios_add_pci_devices(bus);
+	}
+	pe->freeze_count = cnt;
+
+	return 0;
+}
+
+/* The longest amount of time to wait for a pci device
+ * to come back on line, in seconds.
+ */
+#define MAX_WAIT_FOR_RECOVERY 150
+
+/**
+ * eeh_handle_event - Reset a PCI device after hard lockup.
+ * @pe: EEH PE
+ *
+ * While PHB detects address or data parity errors on particular PCI
+ * slot, the associated PE will be frozen. Besides, DMA's occurring
+ * to wild addresses (which usually happen due to bugs in device
+ * drivers or in PCI adapter firmware) can cause EEH error. #SERR,
+ * #PERR or other misc PCI-related errors also can trigger EEH errors.
+ *
+ * Recovery process consists of unplugging the device driver (which
+ * generated hotplug events to userspace), then issuing a PCI #RST to
+ * the device, then reconfiguring the PCI config space for all bridges
+ * & devices under this slot, and then finally restarting the device
+ * drivers (which cause a second set of hotplug events to go out to
+ * userspace).
+ */
+void eeh_handle_event(struct eeh_pe *pe)
+{
+	struct pci_bus *frozen_bus;
+	int rc = 0;
+	enum pci_ers_result result = PCI_ERS_RESULT_NONE;
+
+	frozen_bus = eeh_pe_bus_get(pe);
+	if (!frozen_bus) {
+		pr_err("%s: Cannot find PCI bus for PHB#%d-PE#%x\n",
+			__func__, pe->phb->global_number, pe->addr);
+		return;
+	}
+
+	pe->freeze_count++;
+	if (pe->freeze_count > EEH_MAX_ALLOWED_FREEZES)
+		goto excess_failures;
+	pr_warning("EEH: This PCI device has failed %d times in the last hour\n",
+		pe->freeze_count);
+
+	/* Walk the various device drivers attached to this slot through
+	 * a reset sequence, giving each an opportunity to do what it needs
+	 * to accomplish the reset.  Each child gets a report of the
+	 * status ... if any child can't handle the reset, then the entire
+	 * slot is dlpar removed and added.
+	 */
+	eeh_pe_dev_traverse(pe, eeh_report_error, &result);
+
+	/* Get the current PCI slot state. This can take a long time,
+	 * sometimes over 3 seconds for certain systems.
+	 */
+	rc = eeh_ops->wait_state(pe, MAX_WAIT_FOR_RECOVERY*1000);
+	if (rc < 0 || rc == EEH_STATE_NOT_SUPPORT) {
+		printk(KERN_WARNING "EEH: Permanent failure\n");
+		goto hard_fail;
+	}
+
+	/* Since rtas may enable MMIO when posting the error log,
+	 * don't post the error log until after all dev drivers
+	 * have been informed.
+	 */
+	eeh_slot_error_detail(pe, EEH_LOG_TEMP);
+
+	/* If all device drivers were EEH-unaware, then shut
+	 * down all of the device drivers, and hope they
+	 * go down willingly, without panicing the system.
+	 */
+	if (result == PCI_ERS_RESULT_NONE) {
+		rc = eeh_reset_device(pe, frozen_bus);
+		if (rc) {
+			printk(KERN_WARNING "EEH: Unable to reset, rc=%d\n", rc);
+			goto hard_fail;
+		}
+	}
+
+	/* If all devices reported they can proceed, then re-enable MMIO */
+	if (result == PCI_ERS_RESULT_CAN_RECOVER) {
+		rc = eeh_pci_enable(pe, EEH_OPT_THAW_MMIO);
+
+		if (rc < 0)
+			goto hard_fail;
+		if (rc) {
+			result = PCI_ERS_RESULT_NEED_RESET;
+		} else {
+			result = PCI_ERS_RESULT_NONE;
+			eeh_pe_dev_traverse(pe, eeh_report_mmio_enabled, &result);
+		}
+	}
+
+	/* If all devices reported they can proceed, then re-enable DMA */
+	if (result == PCI_ERS_RESULT_CAN_RECOVER) {
+		rc = eeh_pci_enable(pe, EEH_OPT_THAW_DMA);
+
+		if (rc < 0)
+			goto hard_fail;
+		if (rc)
+			result = PCI_ERS_RESULT_NEED_RESET;
+		else
+			result = PCI_ERS_RESULT_RECOVERED;
+	}
+
+	/* If any device has a hard failure, then shut off everything. */
+	if (result == PCI_ERS_RESULT_DISCONNECT) {
+		printk(KERN_WARNING "EEH: Device driver gave up\n");
+		goto hard_fail;
+	}
+
+	/* If any device called out for a reset, then reset the slot */
+	if (result == PCI_ERS_RESULT_NEED_RESET) {
+		rc = eeh_reset_device(pe, NULL);
+		if (rc) {
+			printk(KERN_WARNING "EEH: Cannot reset, rc=%d\n", rc);
+			goto hard_fail;
+		}
+		result = PCI_ERS_RESULT_NONE;
+		eeh_pe_dev_traverse(pe, eeh_report_reset, &result);
+	}
+
+	/* All devices should claim they have recovered by now. */
+	if ((result != PCI_ERS_RESULT_RECOVERED) &&
+	    (result != PCI_ERS_RESULT_NONE)) {
+		printk(KERN_WARNING "EEH: Not recovered\n");
+		goto hard_fail;
+	}
+
+	/* Tell all device drivers that they can resume operations */
+	eeh_pe_dev_traverse(pe, eeh_report_resume, NULL);
+
+	return;
+	
+excess_failures:
+	/*
+	 * About 90% of all real-life EEH failures in the field
+	 * are due to poorly seated PCI cards. Only 10% or so are
+	 * due to actual, failed cards.
+	 */
+	pr_err("EEH: PHB#%d-PE#%x has failed %d times in the\n"
+	       "last hour and has been permanently disabled.\n"
+	       "Please try reseating or replacing it.\n",
+		pe->phb->global_number, pe->addr,
+		pe->freeze_count);
+	goto perm_error;
+
+hard_fail:
+	pr_err("EEH: Unable to recover from failure from PHB#%d-PE#%x.\n"
+	       "Please try reseating or replacing it\n",
+		pe->phb->global_number, pe->addr);
+
+perm_error:
+	eeh_slot_error_detail(pe, EEH_LOG_PERM);
+
+	/* Notify all devices that they're about to go down. */
+	eeh_pe_dev_traverse(pe, eeh_report_failure, NULL);
+
+	/* Shut down the device drivers for good. */
+	if (frozen_bus)
+		pcibios_remove_pci_devices(frozen_bus);
+}
+
diff --git a/arch/powerpc/kernel/eeh_event.c b/arch/powerpc/kernel/eeh_event.c
new file mode 100644
index 0000000..185bedd
--- /dev/null
+++ b/arch/powerpc/kernel/eeh_event.c
@@ -0,0 +1,142 @@
+/*
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307 USA
+ *
+ * Copyright (c) 2005 Linas Vepstas <linas@linas.org>
+ */
+
+#include <linux/delay.h>
+#include <linux/list.h>
+#include <linux/mutex.h>
+#include <linux/sched.h>
+#include <linux/pci.h>
+#include <linux/slab.h>
+#include <linux/workqueue.h>
+#include <linux/kthread.h>
+#include <asm/eeh_event.h>
+#include <asm/ppc-pci.h>
+
+/** Overview:
+ *  EEH error states may be detected within exception handlers;
+ *  however, the recovery processing needs to occur asynchronously
+ *  in a normal kernel context and not an interrupt context.
+ *  This pair of routines creates an event and queues it onto a
+ *  work-queue, where a worker thread can drive recovery.
+ */
+
+/* EEH event workqueue setup. */
+static DEFINE_SPINLOCK(eeh_eventlist_lock);
+LIST_HEAD(eeh_eventlist);
+static void eeh_thread_launcher(struct work_struct *);
+DECLARE_WORK(eeh_event_wq, eeh_thread_launcher);
+
+/* Serialize reset sequences for a given pci device */
+DEFINE_MUTEX(eeh_event_mutex);
+
+/**
+ * eeh_event_handler - Dispatch EEH events.
+ * @dummy - unused
+ *
+ * The detection of a frozen slot can occur inside an interrupt,
+ * where it can be hard to do anything about it.  The goal of this
+ * routine is to pull these detection events out of the context
+ * of the interrupt handler, and re-dispatch them for processing
+ * at a later time in a normal context.
+ */
+static int eeh_event_handler(void * dummy)
+{
+	unsigned long flags;
+	struct eeh_event *event;
+	struct eeh_pe *pe;
+
+	spin_lock_irqsave(&eeh_eventlist_lock, flags);
+	event = NULL;
+
+	/* Unqueue the event, get ready to process. */
+	if (!list_empty(&eeh_eventlist)) {
+		event = list_entry(eeh_eventlist.next, struct eeh_event, list);
+		list_del(&event->list);
+	}
+	spin_unlock_irqrestore(&eeh_eventlist_lock, flags);
+
+	if (event == NULL)
+		return 0;
+
+	/* Serialize processing of EEH events */
+	mutex_lock(&eeh_event_mutex);
+	pe = event->pe;
+	eeh_pe_state_mark(pe, EEH_PE_RECOVERING);
+	pr_info("EEH: Detected PCI bus error on PHB#%d-PE#%x\n",
+		pe->phb->global_number, pe->addr);
+
+	set_current_state(TASK_INTERRUPTIBLE);	/* Don't add to load average */
+	eeh_handle_event(pe);
+	eeh_pe_state_clear(pe, EEH_PE_RECOVERING);
+
+	kfree(event);
+	mutex_unlock(&eeh_event_mutex);
+
+	/* If there are no new errors after an hour, clear the counter. */
+	if (pe && pe->freeze_count > 0) {
+		msleep_interruptible(3600*1000);
+		if (pe->freeze_count > 0)
+			pe->freeze_count--;
+
+	}
+
+	return 0;
+}
+
+/**
+ * eeh_thread_launcher - Start kernel thread to handle EEH events
+ * @dummy - unused
+ *
+ * This routine is called to start the kernel thread for processing
+ * EEH event.
+ */
+static void eeh_thread_launcher(struct work_struct *dummy)
+{
+	if (IS_ERR(kthread_run(eeh_event_handler, NULL, "eehd")))
+		printk(KERN_ERR "Failed to start EEH daemon\n");
+}
+
+/**
+ * eeh_send_failure_event - Generate a PCI error event
+ * @pe: EEH PE
+ *
+ * This routine can be called within an interrupt context;
+ * the actual event will be delivered in a normal context
+ * (from a workqueue).
+ */
+int eeh_send_failure_event(struct eeh_pe *pe)
+{
+	unsigned long flags;
+	struct eeh_event *event;
+
+	event = kzalloc(sizeof(*event), GFP_ATOMIC);
+	if (!event) {
+		pr_err("EEH: out of memory, event not handled\n");
+		return -ENOMEM;
+	}
+	event->pe = pe;
+
+	/* We may or may not be called in an interrupt context */
+	spin_lock_irqsave(&eeh_eventlist_lock, flags);
+	list_add(&event->list, &eeh_eventlist);
+	spin_unlock_irqrestore(&eeh_eventlist_lock, flags);
+
+	schedule_work(&eeh_event_wq);
+
+	return 0;
+}
diff --git a/arch/powerpc/kernel/eeh_pe.c b/arch/powerpc/kernel/eeh_pe.c
new file mode 100644
index 0000000..9d4a9e8
--- /dev/null
+++ b/arch/powerpc/kernel/eeh_pe.c
@@ -0,0 +1,653 @@
+/*
+ * The file intends to implement PE based on the information from
+ * platforms. Basically, there have 3 types of PEs: PHB/Bus/Device.
+ * All the PEs should be organized as hierarchy tree. The first level
+ * of the tree will be associated to existing PHBs since the particular
+ * PE is only meaningful in one PHB domain.
+ *
+ * Copyright Benjamin Herrenschmidt & Gavin Shan, IBM Corporation 2012.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307 USA
+ */
+
+#include <linux/export.h>
+#include <linux/gfp.h>
+#include <linux/init.h>
+#include <linux/kernel.h>
+#include <linux/pci.h>
+#include <linux/string.h>
+
+#include <asm/pci-bridge.h>
+#include <asm/ppc-pci.h>
+
+static LIST_HEAD(eeh_phb_pe);
+
+/**
+ * eeh_pe_alloc - Allocate PE
+ * @phb: PCI controller
+ * @type: PE type
+ *
+ * Allocate PE instance dynamically.
+ */
+static struct eeh_pe *eeh_pe_alloc(struct pci_controller *phb, int type)
+{
+	struct eeh_pe *pe;
+
+	/* Allocate PHB PE */
+	pe = kzalloc(sizeof(struct eeh_pe), GFP_KERNEL);
+	if (!pe) return NULL;
+
+	/* Initialize PHB PE */
+	pe->type = type;
+	pe->phb = phb;
+	INIT_LIST_HEAD(&pe->child_list);
+	INIT_LIST_HEAD(&pe->child);
+	INIT_LIST_HEAD(&pe->edevs);
+
+	return pe;
+}
+
+/**
+ * eeh_phb_pe_create - Create PHB PE
+ * @phb: PCI controller
+ *
+ * The function should be called while the PHB is detected during
+ * system boot or PCI hotplug in order to create PHB PE.
+ */
+int eeh_phb_pe_create(struct pci_controller *phb)
+{
+	struct eeh_pe *pe;
+
+	/* Allocate PHB PE */
+	pe = eeh_pe_alloc(phb, EEH_PE_PHB);
+	if (!pe) {
+		pr_err("%s: out of memory!\n", __func__);
+		return -ENOMEM;
+	}
+
+	/* Put it into the list */
+	eeh_lock();
+	list_add_tail(&pe->child, &eeh_phb_pe);
+	eeh_unlock();
+
+	pr_debug("EEH: Add PE for PHB#%d\n", phb->global_number);
+
+	return 0;
+}
+
+/**
+ * eeh_phb_pe_get - Retrieve PHB PE based on the given PHB
+ * @phb: PCI controller
+ *
+ * The overall PEs form hierarchy tree. The first layer of the
+ * hierarchy tree is composed of PHB PEs. The function is used
+ * to retrieve the corresponding PHB PE according to the given PHB.
+ */
+static struct eeh_pe *eeh_phb_pe_get(struct pci_controller *phb)
+{
+	struct eeh_pe *pe;
+
+	list_for_each_entry(pe, &eeh_phb_pe, child) {
+		/*
+		 * Actually, we needn't check the type since
+		 * the PE for PHB has been determined when that
+		 * was created.
+		 */
+		if ((pe->type & EEH_PE_PHB) && pe->phb == phb)
+			return pe;
+	}
+
+	return NULL;
+}
+
+/**
+ * eeh_pe_next - Retrieve the next PE in the tree
+ * @pe: current PE
+ * @root: root PE
+ *
+ * The function is used to retrieve the next PE in the
+ * hierarchy PE tree.
+ */
+static struct eeh_pe *eeh_pe_next(struct eeh_pe *pe,
+				  struct eeh_pe *root)
+{
+	struct list_head *next = pe->child_list.next;
+
+	if (next == &pe->child_list) {
+		while (1) {
+			if (pe == root)
+				return NULL;
+			next = pe->child.next;
+			if (next != &pe->parent->child_list)
+				break;
+			pe = pe->parent;
+		}
+	}
+
+	return list_entry(next, struct eeh_pe, child);
+}
+
+/**
+ * eeh_pe_traverse - Traverse PEs in the specified PHB
+ * @root: root PE
+ * @fn: callback
+ * @flag: extra parameter to callback
+ *
+ * The function is used to traverse the specified PE and its
+ * child PEs. The traversing is to be terminated once the
+ * callback returns something other than NULL, or no more PEs
+ * to be traversed.
+ */
+static void *eeh_pe_traverse(struct eeh_pe *root,
+			eeh_traverse_func fn, void *flag)
+{
+	struct eeh_pe *pe;
+	void *ret;
+
+	for (pe = root; pe; pe = eeh_pe_next(pe, root)) {
+		ret = fn(pe, flag);
+		if (ret) return ret;
+	}
+
+	return NULL;
+}
+
+/**
+ * eeh_pe_dev_traverse - Traverse the devices from the PE
+ * @root: EEH PE
+ * @fn: function callback
+ * @flag: extra parameter to callback
+ *
+ * The function is used to traverse the devices of the specified
+ * PE and its child PEs.
+ */
+void *eeh_pe_dev_traverse(struct eeh_pe *root,
+		eeh_traverse_func fn, void *flag)
+{
+	struct eeh_pe *pe;
+	struct eeh_dev *edev;
+	void *ret;
+
+	if (!root) {
+		pr_warning("%s: Invalid PE %p\n", __func__, root);
+		return NULL;
+	}
+
+	eeh_lock();
+
+	/* Traverse root PE */
+	for (pe = root; pe; pe = eeh_pe_next(pe, root)) {
+		eeh_pe_for_each_dev(pe, edev) {
+			ret = fn(edev, flag);
+			if (ret) {
+				eeh_unlock();
+				return ret;
+			}
+		}
+	}
+
+	eeh_unlock();
+
+	return NULL;
+}
+
+/**
+ * __eeh_pe_get - Check the PE address
+ * @data: EEH PE
+ * @flag: EEH device
+ *
+ * For one particular PE, it can be identified by PE address
+ * or tranditional BDF address. BDF address is composed of
+ * Bus/Device/Function number. The extra data referred by flag
+ * indicates which type of address should be used.
+ */
+static void *__eeh_pe_get(void *data, void *flag)
+{
+	struct eeh_pe *pe = (struct eeh_pe *)data;
+	struct eeh_dev *edev = (struct eeh_dev *)flag;
+
+	/* Unexpected PHB PE */
+	if (pe->type & EEH_PE_PHB)
+		return NULL;
+
+	/* We prefer PE address */
+	if (edev->pe_config_addr &&
+	   (edev->pe_config_addr == pe->addr))
+		return pe;
+
+	/* Try BDF address */
+	if (edev->pe_config_addr &&
+	   (edev->config_addr == pe->config_addr))
+		return pe;
+
+	return NULL;
+}
+
+/**
+ * eeh_pe_get - Search PE based on the given address
+ * @edev: EEH device
+ *
+ * Search the corresponding PE based on the specified address which
+ * is included in the eeh device. The function is used to check if
+ * the associated PE has been created against the PE address. It's
+ * notable that the PE address has 2 format: traditional PE address
+ * which is composed of PCI bus/device/function number, or unified
+ * PE address.
+ */
+static struct eeh_pe *eeh_pe_get(struct eeh_dev *edev)
+{
+	struct eeh_pe *root = eeh_phb_pe_get(edev->phb);
+	struct eeh_pe *pe;
+
+	pe = eeh_pe_traverse(root, __eeh_pe_get, edev);
+
+	return pe;
+}
+
+/**
+ * eeh_pe_get_parent - Retrieve the parent PE
+ * @edev: EEH device
+ *
+ * The whole PEs existing in the system are organized as hierarchy
+ * tree. The function is used to retrieve the parent PE according
+ * to the parent EEH device.
+ */
+static struct eeh_pe *eeh_pe_get_parent(struct eeh_dev *edev)
+{
+	struct device_node *dn;
+	struct eeh_dev *parent;
+
+	/*
+	 * It might have the case for the indirect parent
+	 * EEH device already having associated PE, but
+	 * the direct parent EEH device doesn't have yet.
+	 */
+	dn = edev->dn->parent;
+	while (dn) {
+		/* We're poking out of PCI territory */
+		if (!PCI_DN(dn)) return NULL;
+
+		parent = of_node_to_eeh_dev(dn);
+		/* We're poking out of PCI territory */
+		if (!parent) return NULL;
+
+		if (parent->pe)
+			return parent->pe;
+
+		dn = dn->parent;
+	}
+
+	return NULL;
+}
+
+/**
+ * eeh_add_to_parent_pe - Add EEH device to parent PE
+ * @edev: EEH device
+ *
+ * Add EEH device to the parent PE. If the parent PE already
+ * exists, the PE type will be changed to EEH_PE_BUS. Otherwise,
+ * we have to create new PE to hold the EEH device and the new
+ * PE will be linked to its parent PE as well.
+ */
+int eeh_add_to_parent_pe(struct eeh_dev *edev)
+{
+	struct eeh_pe *pe, *parent;
+
+	eeh_lock();
+
+	/*
+	 * Search the PE has been existing or not according
+	 * to the PE address. If that has been existing, the
+	 * PE should be composed of PCI bus and its subordinate
+	 * components.
+	 */
+	pe = eeh_pe_get(edev);
+	if (pe && !(pe->type & EEH_PE_INVALID)) {
+		if (!edev->pe_config_addr) {
+			eeh_unlock();
+			pr_err("%s: PE with addr 0x%x already exists\n",
+				__func__, edev->config_addr);
+			return -EEXIST;
+		}
+
+		/* Mark the PE as type of PCI bus */
+		pe->type = EEH_PE_BUS;
+		edev->pe = pe;
+
+		/* Put the edev to PE */
+		list_add_tail(&edev->list, &pe->edevs);
+		eeh_unlock();
+		pr_debug("EEH: Add %s to Bus PE#%x\n",
+			edev->dn->full_name, pe->addr);
+
+		return 0;
+	} else if (pe && (pe->type & EEH_PE_INVALID)) {
+		list_add_tail(&edev->list, &pe->edevs);
+		edev->pe = pe;
+		/*
+		 * We're running to here because of PCI hotplug caused by
+		 * EEH recovery. We need clear EEH_PE_INVALID until the top.
+		 */
+		parent = pe;
+		while (parent) {
+			if (!(parent->type & EEH_PE_INVALID))
+				break;
+			parent->type &= ~EEH_PE_INVALID;
+			parent = parent->parent;
+		}
+		eeh_unlock();
+		pr_debug("EEH: Add %s to Device PE#%x, Parent PE#%x\n",
+			edev->dn->full_name, pe->addr, pe->parent->addr);
+
+		return 0;
+	}
+
+	/* Create a new EEH PE */
+	pe = eeh_pe_alloc(edev->phb, EEH_PE_DEVICE);
+	if (!pe) {
+		eeh_unlock();
+		pr_err("%s: out of memory!\n", __func__);
+		return -ENOMEM;
+	}
+	pe->addr	= edev->pe_config_addr;
+	pe->config_addr	= edev->config_addr;
+
+	/*
+	 * Put the new EEH PE into hierarchy tree. If the parent
+	 * can't be found, the newly created PE will be attached
+	 * to PHB directly. Otherwise, we have to associate the
+	 * PE with its parent.
+	 */
+	parent = eeh_pe_get_parent(edev);
+	if (!parent) {
+		parent = eeh_phb_pe_get(edev->phb);
+		if (!parent) {
+			eeh_unlock();
+			pr_err("%s: No PHB PE is found (PHB Domain=%d)\n",
+				__func__, edev->phb->global_number);
+			edev->pe = NULL;
+			kfree(pe);
+			return -EEXIST;
+		}
+	}
+	pe->parent = parent;
+
+	/*
+	 * Put the newly created PE into the child list and
+	 * link the EEH device accordingly.
+	 */
+	list_add_tail(&pe->child, &parent->child_list);
+	list_add_tail(&edev->list, &pe->edevs);
+	edev->pe = pe;
+	eeh_unlock();
+	pr_debug("EEH: Add %s to Device PE#%x, Parent PE#%x\n",
+		edev->dn->full_name, pe->addr, pe->parent->addr);
+
+	return 0;
+}
+
+/**
+ * eeh_rmv_from_parent_pe - Remove one EEH device from the associated PE
+ * @edev: EEH device
+ * @purge_pe: remove PE or not
+ *
+ * The PE hierarchy tree might be changed when doing PCI hotplug.
+ * Also, the PCI devices or buses could be removed from the system
+ * during EEH recovery. So we have to call the function remove the
+ * corresponding PE accordingly if necessary.
+ */
+int eeh_rmv_from_parent_pe(struct eeh_dev *edev, int purge_pe)
+{
+	struct eeh_pe *pe, *parent, *child;
+	int cnt;
+
+	if (!edev->pe) {
+		pr_warning("%s: No PE found for EEH device %s\n",
+			__func__, edev->dn->full_name);
+		return -EEXIST;
+	}
+
+	eeh_lock();
+
+	/* Remove the EEH device */
+	pe = edev->pe;
+	edev->pe = NULL;
+	list_del(&edev->list);
+
+	/*
+	 * Check if the parent PE includes any EEH devices.
+	 * If not, we should delete that. Also, we should
+	 * delete the parent PE if it doesn't have associated
+	 * child PEs and EEH devices.
+	 */
+	while (1) {
+		parent = pe->parent;
+		if (pe->type & EEH_PE_PHB)
+			break;
+
+		if (purge_pe) {
+			if (list_empty(&pe->edevs) &&
+			    list_empty(&pe->child_list)) {
+				list_del(&pe->child);
+				kfree(pe);
+			} else {
+				break;
+			}
+		} else {
+			if (list_empty(&pe->edevs)) {
+				cnt = 0;
+				list_for_each_entry(child, &pe->child_list, child) {
+					if (!(child->type & EEH_PE_INVALID)) {
+						cnt++;
+						break;
+					}
+				}
+
+				if (!cnt)
+					pe->type |= EEH_PE_INVALID;
+				else
+					break;
+			}
+		}
+
+		pe = parent;
+	}
+
+	eeh_unlock();
+
+	return 0;
+}
+
+/**
+ * __eeh_pe_state_mark - Mark the state for the PE
+ * @data: EEH PE
+ * @flag: state
+ *
+ * The function is used to mark the indicated state for the given
+ * PE. Also, the associated PCI devices will be put into IO frozen
+ * state as well.
+ */
+static void *__eeh_pe_state_mark(void *data, void *flag)
+{
+	struct eeh_pe *pe = (struct eeh_pe *)data;
+	int state = *((int *)flag);
+	struct eeh_dev *tmp;
+	struct pci_dev *pdev;
+
+	/*
+	 * Mark the PE with the indicated state. Also,
+	 * the associated PCI device will be put into
+	 * I/O frozen state to avoid I/O accesses from
+	 * the PCI device driver.
+	 */
+	pe->state |= state;
+	eeh_pe_for_each_dev(pe, tmp) {
+		pdev = eeh_dev_to_pci_dev(tmp);
+		if (pdev)
+			pdev->error_state = pci_channel_io_frozen;
+	}
+
+	return NULL;
+}
+
+/**
+ * eeh_pe_state_mark - Mark specified state for PE and its associated device
+ * @pe: EEH PE
+ *
+ * EEH error affects the current PE and its child PEs. The function
+ * is used to mark appropriate state for the affected PEs and the
+ * associated devices.
+ */
+void eeh_pe_state_mark(struct eeh_pe *pe, int state)
+{
+	eeh_lock();
+	eeh_pe_traverse(pe, __eeh_pe_state_mark, &state);
+	eeh_unlock();
+}
+
+/**
+ * __eeh_pe_state_clear - Clear state for the PE
+ * @data: EEH PE
+ * @flag: state
+ *
+ * The function is used to clear the indicated state from the
+ * given PE. Besides, we also clear the check count of the PE
+ * as well.
+ */
+static void *__eeh_pe_state_clear(void *data, void *flag)
+{
+	struct eeh_pe *pe = (struct eeh_pe *)data;
+	int state = *((int *)flag);
+
+	pe->state &= ~state;
+	pe->check_count = 0;
+
+	return NULL;
+}
+
+/**
+ * eeh_pe_state_clear - Clear state for the PE and its children
+ * @pe: PE
+ * @state: state to be cleared
+ *
+ * When the PE and its children has been recovered from error,
+ * we need clear the error state for that. The function is used
+ * for the purpose.
+ */
+void eeh_pe_state_clear(struct eeh_pe *pe, int state)
+{
+	eeh_lock();
+	eeh_pe_traverse(pe, __eeh_pe_state_clear, &state);
+	eeh_unlock();
+}
+
+/**
+ * eeh_restore_one_device_bars - Restore the Base Address Registers for one device
+ * @data: EEH device
+ * @flag: Unused
+ *
+ * Loads the PCI configuration space base address registers,
+ * the expansion ROM base address, the latency timer, and etc.
+ * from the saved values in the device node.
+ */
+static void *eeh_restore_one_device_bars(void *data, void *flag)
+{
+	int i;
+	u32 cmd;
+	struct eeh_dev *edev = (struct eeh_dev *)data;
+	struct device_node *dn = eeh_dev_to_of_node(edev);
+
+	for (i = 4; i < 10; i++)
+		eeh_ops->write_config(dn, i*4, 4, edev->config_space[i]);
+	/* 12 == Expansion ROM Address */
+	eeh_ops->write_config(dn, 12*4, 4, edev->config_space[12]);
+
+#define BYTE_SWAP(OFF) (8*((OFF)/4)+3-(OFF))
+#define SAVED_BYTE(OFF) (((u8 *)(edev->config_space))[BYTE_SWAP(OFF)])
+
+	eeh_ops->write_config(dn, PCI_CACHE_LINE_SIZE, 1,
+		SAVED_BYTE(PCI_CACHE_LINE_SIZE));
+	eeh_ops->write_config(dn, PCI_LATENCY_TIMER, 1,
+		SAVED_BYTE(PCI_LATENCY_TIMER));
+
+	/* max latency, min grant, interrupt pin and line */
+	eeh_ops->write_config(dn, 15*4, 4, edev->config_space[15]);
+
+	/*
+	 * Restore PERR & SERR bits, some devices require it,
+	 * don't touch the other command bits
+	 */
+	eeh_ops->read_config(dn, PCI_COMMAND, 4, &cmd);
+	if (edev->config_space[1] & PCI_COMMAND_PARITY)
+		cmd |= PCI_COMMAND_PARITY;
+	else
+		cmd &= ~PCI_COMMAND_PARITY;
+	if (edev->config_space[1] & PCI_COMMAND_SERR)
+		cmd |= PCI_COMMAND_SERR;
+	else
+		cmd &= ~PCI_COMMAND_SERR;
+	eeh_ops->write_config(dn, PCI_COMMAND, 4, cmd);
+
+	return NULL;
+}
+
+/**
+ * eeh_pe_restore_bars - Restore the PCI config space info
+ * @pe: EEH PE
+ *
+ * This routine performs a recursive walk to the children
+ * of this device as well.
+ */
+void eeh_pe_restore_bars(struct eeh_pe *pe)
+{
+	/*
+	 * We needn't take the EEH lock since eeh_pe_dev_traverse()
+	 * will take that.
+	 */
+	eeh_pe_dev_traverse(pe, eeh_restore_one_device_bars, NULL);
+}
+
+/**
+ * eeh_pe_bus_get - Retrieve PCI bus according to the given PE
+ * @pe: EEH PE
+ *
+ * Retrieve the PCI bus according to the given PE. Basically,
+ * there're 3 types of PEs: PHB/Bus/Device. For PHB PE, the
+ * primary PCI bus will be retrieved. The parent bus will be
+ * returned for BUS PE. However, we don't have associated PCI
+ * bus for DEVICE PE.
+ */
+struct pci_bus *eeh_pe_bus_get(struct eeh_pe *pe)
+{
+	struct pci_bus *bus = NULL;
+	struct eeh_dev *edev;
+	struct pci_dev *pdev;
+
+	eeh_lock();
+
+	if (pe->type & EEH_PE_PHB) {
+		bus = pe->phb->bus;
+	} else if (pe->type & EEH_PE_BUS ||
+		   pe->type & EEH_PE_DEVICE) {
+		edev = list_first_entry(&pe->edevs, struct eeh_dev, list);
+		pdev = eeh_dev_to_pci_dev(edev);
+		if (pdev)
+			bus = pdev->bus;
+	}
+
+	eeh_unlock();
+
+	return bus;
+}
diff --git a/arch/powerpc/kernel/eeh_sysfs.c b/arch/powerpc/kernel/eeh_sysfs.c
new file mode 100644
index 0000000..d377083
--- /dev/null
+++ b/arch/powerpc/kernel/eeh_sysfs.c
@@ -0,0 +1,75 @@
+/*
+ * Sysfs entries for PCI Error Recovery for PAPR-compliant platform.
+ * Copyright IBM Corporation 2007
+ * Copyright Linas Vepstas <linas@austin.ibm.com> 2007
+ *
+ * All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or (at
+ * your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE, GOOD TITLE or
+ * NON INFRINGEMENT.  See the GNU General Public License for more
+ * details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
+ *
+ * Send comments and feedback to Linas Vepstas <linas@austin.ibm.com>
+ */
+#include <linux/pci.h>
+#include <linux/stat.h>
+#include <asm/ppc-pci.h>
+#include <asm/pci-bridge.h>
+
+/**
+ * EEH_SHOW_ATTR -- Create sysfs entry for eeh statistic
+ * @_name: name of file in sysfs directory
+ * @_memb: name of member in struct pci_dn to access
+ * @_format: printf format for display
+ *
+ * All of the attributes look very similar, so just
+ * auto-gen a cut-n-paste routine to display them.
+ */
+#define EEH_SHOW_ATTR(_name,_memb,_format)               \
+static ssize_t eeh_show_##_name(struct device *dev,      \
+		struct device_attribute *attr, char *buf)          \
+{                                                        \
+	struct pci_dev *pdev = to_pci_dev(dev);               \
+	struct eeh_dev *edev = pci_dev_to_eeh_dev(pdev);      \
+	                                                      \
+	if (!edev)                                            \
+		return 0;                                     \
+	                                                      \
+	return sprintf(buf, _format "\n", edev->_memb);       \
+}                                                        \
+static DEVICE_ATTR(_name, S_IRUGO, eeh_show_##_name, NULL);
+
+EEH_SHOW_ATTR(eeh_mode,            mode,            "0x%x");
+EEH_SHOW_ATTR(eeh_config_addr,     config_addr,     "0x%x");
+EEH_SHOW_ATTR(eeh_pe_config_addr,  pe_config_addr,  "0x%x");
+
+void eeh_sysfs_add_device(struct pci_dev *pdev)
+{
+	int rc=0;
+
+	rc += device_create_file(&pdev->dev, &dev_attr_eeh_mode);
+	rc += device_create_file(&pdev->dev, &dev_attr_eeh_config_addr);
+	rc += device_create_file(&pdev->dev, &dev_attr_eeh_pe_config_addr);
+
+	if (rc)
+		printk(KERN_WARNING "EEH: Unable to create sysfs entries\n");
+}
+
+void eeh_sysfs_remove_device(struct pci_dev *pdev)
+{
+	device_remove_file(&pdev->dev, &dev_attr_eeh_mode);
+	device_remove_file(&pdev->dev, &dev_attr_eeh_config_addr);
+	device_remove_file(&pdev->dev, &dev_attr_eeh_pe_config_addr);
+}
+
diff --git a/arch/powerpc/kernel/pci_hotplug.c b/arch/powerpc/kernel/pci_hotplug.c
new file mode 100644
index 0000000..3f60880
--- /dev/null
+++ b/arch/powerpc/kernel/pci_hotplug.c
@@ -0,0 +1,111 @@
+/*
+ * Derived from "arch/powerpc/platforms/pseries/pci_dlpar.c"
+ *
+ * Copyright (C) 2003 Linda Xie <lxie@us.ibm.com>
+ * Copyright (C) 2005 International Business Machines
+ *
+ * Updates, 2005, John Rose <johnrose@austin.ibm.com>
+ * Updates, 2005, Linas Vepstas <linas@austin.ibm.com>
+ * Updates, 2013, Gavin Shan <shangw@linux.vnet.ibm.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#include <linux/pci.h>
+#include <linux/export.h>
+#include <asm/pci-bridge.h>
+#include <asm/ppc-pci.h>
+#include <asm/firmware.h>
+#include <asm/eeh.h>
+
+/**
+ * __pcibios_remove_pci_devices - remove all devices under this bus
+ * @bus: the indicated PCI bus
+ * @purge_pe: destroy the PE on removal of PCI devices
+ *
+ * Remove all of the PCI devices under this bus both from the
+ * linux pci device tree, and from the powerpc EEH address cache.
+ * By default, the corresponding PE will be destroied during the
+ * normal PCI hotplug path. For PCI hotplug during EEH recovery,
+ * the corresponding PE won't be destroied and deallocated.
+ */
+void __pcibios_remove_pci_devices(struct pci_bus *bus, int purge_pe)
+{
+	struct pci_dev *dev, *tmp;
+	struct pci_bus *child_bus;
+
+	/* First go down child busses */
+	list_for_each_entry(child_bus, &bus->children, node)
+		__pcibios_remove_pci_devices(child_bus, purge_pe);
+
+	pr_debug("PCI: Removing devices on bus %04x:%02x\n",
+		 pci_domain_nr(bus),  bus->number);
+	list_for_each_entry_safe(dev, tmp, &bus->devices, bus_list) {
+		pr_debug("     * Removing %s...\n", pci_name(dev));
+		eeh_remove_bus_device(dev, purge_pe);
+		pci_stop_and_remove_bus_device(dev);
+	}
+}
+
+/**
+ * pcibios_remove_pci_devices - remove all devices under this bus
+ * @bus: the indicated PCI bus
+ *
+ * Remove all of the PCI devices under this bus both from the
+ * linux pci device tree, and from the powerpc EEH address cache.
+ */
+void pcibios_remove_pci_devices(struct pci_bus *bus)
+{
+	__pcibios_remove_pci_devices(bus, 1);
+}
+EXPORT_SYMBOL_GPL(pcibios_remove_pci_devices);
+
+/**
+ * pcibios_add_pci_devices - adds new pci devices to bus
+ * @bus: the indicated PCI bus
+ *
+ * This routine will find and fixup new pci devices under
+ * the indicated bus. This routine presumes that there
+ * might already be some devices under this bridge, so
+ * it carefully tries to add only new devices.  (And that
+ * is how this routine differs from other, similar pcibios
+ * routines.)
+ */
+void pcibios_add_pci_devices(struct pci_bus * bus)
+{
+	int slotno, num, mode, pass, max;
+	struct pci_dev *dev;
+	struct device_node *dn = pci_bus_to_OF_node(bus);
+
+	eeh_add_device_tree_early(dn);
+
+	mode = PCI_PROBE_NORMAL;
+	if (ppc_md.pci_probe_mode)
+		mode = ppc_md.pci_probe_mode(bus);
+
+	if (mode == PCI_PROBE_DEVTREE) {
+		/* use ofdt-based probe */
+		of_rescan_bus(dn, bus);
+	} else if (mode == PCI_PROBE_NORMAL) {
+		/* use legacy probe */
+		slotno = PCI_SLOT(PCI_DN(dn->child)->devfn);
+		num = pci_scan_slot(bus, PCI_DEVFN(slotno, 0));
+		if (!num)
+			return;
+		pcibios_setup_bus_devices(bus);
+		max = bus->busn_res.start;
+		for (pass = 0; pass < 2; pass++) {
+			list_for_each_entry(dev, &bus->devices, bus_list) {
+				if (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE ||
+				    dev->hdr_type == PCI_HEADER_TYPE_CARDBUS)
+					max = pci_scan_bridge(bus, dev,
+							      max, pass);
+			}
+		}
+	}
+	pcibios_finish_adding_to_bus(bus);
+}
+EXPORT_SYMBOL_GPL(pcibios_add_pci_devices);
diff --git a/arch/powerpc/platforms/Kconfig b/arch/powerpc/platforms/Kconfig
index b62aab3..bed8c60 100644
--- a/arch/powerpc/platforms/Kconfig
+++ b/arch/powerpc/platforms/Kconfig
@@ -164,6 +164,11 @@ config IBMEBUS
 	help
 	  Bus device driver for GX bus based adapters.
 
+config EEH
+	bool
+	depends on (PPC_POWERNV || PPC_PSERIES) && PCI
+	default y
+
 config PPC_MPC106
 	bool
 	default n
diff --git a/arch/powerpc/platforms/pseries/Kconfig b/arch/powerpc/platforms/pseries/Kconfig
index 4459eff..1bd3399 100644
--- a/arch/powerpc/platforms/pseries/Kconfig
+++ b/arch/powerpc/platforms/pseries/Kconfig
@@ -33,11 +33,6 @@ config PPC_SPLPAR
 	  processors, that is, which share physical processors between
 	  two or more partitions.
 
-config EEH
-	bool
-	depends on PPC_PSERIES && PCI
-	default y
-
 config PSERIES_MSI
        bool
        depends on PCI_MSI && EEH
diff --git a/arch/powerpc/platforms/pseries/Makefile b/arch/powerpc/platforms/pseries/Makefile
index 53866e5..8ae0103 100644
--- a/arch/powerpc/platforms/pseries/Makefile
+++ b/arch/powerpc/platforms/pseries/Makefile
@@ -6,9 +6,7 @@ obj-y			:= lpar.o hvCall.o nvram.o reconfig.o \
 			   firmware.o power.o dlpar.o mobility.o
 obj-$(CONFIG_SMP)	+= smp.o
 obj-$(CONFIG_SCANLOG)	+= scanlog.o
-obj-$(CONFIG_EEH)	+= eeh.o eeh_pe.o eeh_dev.o eeh_cache.o \
-			   eeh_driver.o eeh_event.o eeh_sysfs.o \
-			   eeh_pseries.o
+obj-$(CONFIG_EEH)	+= eeh_pseries.o
 obj-$(CONFIG_KEXEC)	+= kexec.o
 obj-$(CONFIG_PCI)	+= pci.o pci_dlpar.o
 obj-$(CONFIG_PSERIES_MSI)	+= msi.o
diff --git a/arch/powerpc/platforms/pseries/eeh.c b/arch/powerpc/platforms/pseries/eeh.c
deleted file mode 100644
index 6b73d6c..0000000
--- a/arch/powerpc/platforms/pseries/eeh.c
+++ /dev/null
@@ -1,942 +0,0 @@
-/*
- * Copyright IBM Corporation 2001, 2005, 2006
- * Copyright Dave Engebretsen & Todd Inglett 2001
- * Copyright Linas Vepstas 2005, 2006
- * Copyright 2001-2012 IBM Corporation.
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License as published by
- * the Free Software Foundation; either version 2 of the License, or
- * (at your option) any later version.
- *
- * This program is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
- * GNU General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with this program; if not, write to the Free Software
- * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307 USA
- *
- * Please address comments and feedback to Linas Vepstas <linas@austin.ibm.com>
- */
-
-#include <linux/delay.h>
-#include <linux/sched.h>
-#include <linux/init.h>
-#include <linux/list.h>
-#include <linux/pci.h>
-#include <linux/proc_fs.h>
-#include <linux/rbtree.h>
-#include <linux/seq_file.h>
-#include <linux/spinlock.h>
-#include <linux/export.h>
-#include <linux/of.h>
-
-#include <linux/atomic.h>
-#include <asm/eeh.h>
-#include <asm/eeh_event.h>
-#include <asm/io.h>
-#include <asm/machdep.h>
-#include <asm/ppc-pci.h>
-#include <asm/rtas.h>
-
-
-/** Overview:
- *  EEH, or "Extended Error Handling" is a PCI bridge technology for
- *  dealing with PCI bus errors that can't be dealt with within the
- *  usual PCI framework, except by check-stopping the CPU.  Systems
- *  that are designed for high-availability/reliability cannot afford
- *  to crash due to a "mere" PCI error, thus the need for EEH.
- *  An EEH-capable bridge operates by converting a detected error
- *  into a "slot freeze", taking the PCI adapter off-line, making
- *  the slot behave, from the OS'es point of view, as if the slot
- *  were "empty": all reads return 0xff's and all writes are silently
- *  ignored.  EEH slot isolation events can be triggered by parity
- *  errors on the address or data busses (e.g. during posted writes),
- *  which in turn might be caused by low voltage on the bus, dust,
- *  vibration, humidity, radioactivity or plain-old failed hardware.
- *
- *  Note, however, that one of the leading causes of EEH slot
- *  freeze events are buggy device drivers, buggy device microcode,
- *  or buggy device hardware.  This is because any attempt by the
- *  device to bus-master data to a memory address that is not
- *  assigned to the device will trigger a slot freeze.   (The idea
- *  is to prevent devices-gone-wild from corrupting system memory).
- *  Buggy hardware/drivers will have a miserable time co-existing
- *  with EEH.
- *
- *  Ideally, a PCI device driver, when suspecting that an isolation
- *  event has occurred (e.g. by reading 0xff's), will then ask EEH
- *  whether this is the case, and then take appropriate steps to
- *  reset the PCI slot, the PCI device, and then resume operations.
- *  However, until that day,  the checking is done here, with the
- *  eeh_check_failure() routine embedded in the MMIO macros.  If
- *  the slot is found to be isolated, an "EEH Event" is synthesized
- *  and sent out for processing.
- */
-
-/* If a device driver keeps reading an MMIO register in an interrupt
- * handler after a slot isolation event, it might be broken.
- * This sets the threshold for how many read attempts we allow
- * before printing an error message.
- */
-#define EEH_MAX_FAILS	2100000
-
-/* Time to wait for a PCI slot to report status, in milliseconds */
-#define PCI_BUS_RESET_WAIT_MSEC (60*1000)
-
-/* Platform dependent EEH operations */
-struct eeh_ops *eeh_ops = NULL;
-
-int eeh_subsystem_enabled;
-EXPORT_SYMBOL(eeh_subsystem_enabled);
-
-/*
- * EEH probe mode support. The intention is to support multiple
- * platforms for EEH. Some platforms like pSeries do PCI emunation
- * based on device tree. However, other platforms like powernv probe
- * PCI devices from hardware. The flag is used to distinguish that.
- * In addition, struct eeh_ops::probe would be invoked for particular
- * OF node or PCI device so that the corresponding PE would be created
- * there.
- */
-int eeh_probe_mode;
-
-/* Global EEH mutex */
-DEFINE_MUTEX(eeh_mutex);
-
-/* Lock to avoid races due to multiple reports of an error */
-static DEFINE_RAW_SPINLOCK(confirm_error_lock);
-
-/* Buffer for reporting pci register dumps. Its here in BSS, and
- * not dynamically alloced, so that it ends up in RMO where RTAS
- * can access it.
- */
-#define EEH_PCI_REGS_LOG_LEN 4096
-static unsigned char pci_regs_buf[EEH_PCI_REGS_LOG_LEN];
-
-/*
- * The struct is used to maintain the EEH global statistic
- * information. Besides, the EEH global statistics will be
- * exported to user space through procfs
- */
-struct eeh_stats {
-	u64 no_device;		/* PCI device not found		*/
-	u64 no_dn;		/* OF node not found		*/
-	u64 no_cfg_addr;	/* Config address not found	*/
-	u64 ignored_check;	/* EEH check skipped		*/
-	u64 total_mmio_ffs;	/* Total EEH checks		*/
-	u64 false_positives;	/* Unnecessary EEH checks	*/
-	u64 slot_resets;	/* PE reset			*/
-};
-
-static struct eeh_stats eeh_stats;
-
-#define IS_BRIDGE(class_code) (((class_code)<<16) == PCI_BASE_CLASS_BRIDGE)
-
-/**
- * eeh_gather_pci_data - Copy assorted PCI config space registers to buff
- * @edev: device to report data for
- * @buf: point to buffer in which to log
- * @len: amount of room in buffer
- *
- * This routine captures assorted PCI configuration space data,
- * and puts them into a buffer for RTAS error logging.
- */
-static size_t eeh_gather_pci_data(struct eeh_dev *edev, char * buf, size_t len)
-{
-	struct device_node *dn = eeh_dev_to_of_node(edev);
-	struct pci_dev *dev = eeh_dev_to_pci_dev(edev);
-	u32 cfg;
-	int cap, i;
-	int n = 0;
-
-	n += scnprintf(buf+n, len-n, "%s\n", dn->full_name);
-	printk(KERN_WARNING "EEH: of node=%s\n", dn->full_name);
-
-	eeh_ops->read_config(dn, PCI_VENDOR_ID, 4, &cfg);
-	n += scnprintf(buf+n, len-n, "dev/vend:%08x\n", cfg);
-	printk(KERN_WARNING "EEH: PCI device/vendor: %08x\n", cfg);
-
-	eeh_ops->read_config(dn, PCI_COMMAND, 4, &cfg);
-	n += scnprintf(buf+n, len-n, "cmd/stat:%x\n", cfg);
-	printk(KERN_WARNING "EEH: PCI cmd/status register: %08x\n", cfg);
-
-	if (!dev) {
-		printk(KERN_WARNING "EEH: no PCI device for this of node\n");
-		return n;
-	}
-
-	/* Gather bridge-specific registers */
-	if (dev->class >> 16 == PCI_BASE_CLASS_BRIDGE) {
-		eeh_ops->read_config(dn, PCI_SEC_STATUS, 2, &cfg);
-		n += scnprintf(buf+n, len-n, "sec stat:%x\n", cfg);
-		printk(KERN_WARNING "EEH: Bridge secondary status: %04x\n", cfg);
-
-		eeh_ops->read_config(dn, PCI_BRIDGE_CONTROL, 2, &cfg);
-		n += scnprintf(buf+n, len-n, "brdg ctl:%x\n", cfg);
-		printk(KERN_WARNING "EEH: Bridge control: %04x\n", cfg);
-	}
-
-	/* Dump out the PCI-X command and status regs */
-	cap = pci_find_capability(dev, PCI_CAP_ID_PCIX);
-	if (cap) {
-		eeh_ops->read_config(dn, cap, 4, &cfg);
-		n += scnprintf(buf+n, len-n, "pcix-cmd:%x\n", cfg);
-		printk(KERN_WARNING "EEH: PCI-X cmd: %08x\n", cfg);
-
-		eeh_ops->read_config(dn, cap+4, 4, &cfg);
-		n += scnprintf(buf+n, len-n, "pcix-stat:%x\n", cfg);
-		printk(KERN_WARNING "EEH: PCI-X status: %08x\n", cfg);
-	}
-
-	/* If PCI-E capable, dump PCI-E cap 10, and the AER */
-	cap = pci_find_capability(dev, PCI_CAP_ID_EXP);
-	if (cap) {
-		n += scnprintf(buf+n, len-n, "pci-e cap10:\n");
-		printk(KERN_WARNING
-		       "EEH: PCI-E capabilities and status follow:\n");
-
-		for (i=0; i<=8; i++) {
-			eeh_ops->read_config(dn, cap+4*i, 4, &cfg);
-			n += scnprintf(buf+n, len-n, "%02x:%x\n", 4*i, cfg);
-			printk(KERN_WARNING "EEH: PCI-E %02x: %08x\n", i, cfg);
-		}
-
-		cap = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_ERR);
-		if (cap) {
-			n += scnprintf(buf+n, len-n, "pci-e AER:\n");
-			printk(KERN_WARNING
-			       "EEH: PCI-E AER capability register set follows:\n");
-
-			for (i=0; i<14; i++) {
-				eeh_ops->read_config(dn, cap+4*i, 4, &cfg);
-				n += scnprintf(buf+n, len-n, "%02x:%x\n", 4*i, cfg);
-				printk(KERN_WARNING "EEH: PCI-E AER %02x: %08x\n", i, cfg);
-			}
-		}
-	}
-
-	return n;
-}
-
-/**
- * eeh_slot_error_detail - Generate combined log including driver log and error log
- * @pe: EEH PE
- * @severity: temporary or permanent error log
- *
- * This routine should be called to generate the combined log, which
- * is comprised of driver log and error log. The driver log is figured
- * out from the config space of the corresponding PCI device, while
- * the error log is fetched through platform dependent function call.
- */
-void eeh_slot_error_detail(struct eeh_pe *pe, int severity)
-{
-	size_t loglen = 0;
-	struct eeh_dev *edev;
-
-	eeh_pci_enable(pe, EEH_OPT_THAW_MMIO);
-	eeh_ops->configure_bridge(pe);
-	eeh_pe_restore_bars(pe);
-
-	pci_regs_buf[0] = 0;
-	eeh_pe_for_each_dev(pe, edev) {
-		loglen += eeh_gather_pci_data(edev, pci_regs_buf,
-				EEH_PCI_REGS_LOG_LEN);
-        }
-
-	eeh_ops->get_log(pe, severity, pci_regs_buf, loglen);
-}
-
-/**
- * eeh_token_to_phys - Convert EEH address token to phys address
- * @token: I/O token, should be address in the form 0xA....
- *
- * This routine should be called to convert virtual I/O address
- * to physical one.
- */
-static inline unsigned long eeh_token_to_phys(unsigned long token)
-{
-	pte_t *ptep;
-	unsigned long pa;
-
-	ptep = find_linux_pte(init_mm.pgd, token);
-	if (!ptep)
-		return token;
-	pa = pte_pfn(*ptep) << PAGE_SHIFT;
-
-	return pa | (token & (PAGE_SIZE-1));
-}
-
-/**
- * eeh_dev_check_failure - Check if all 1's data is due to EEH slot freeze
- * @edev: eeh device
- *
- * Check for an EEH failure for the given device node.  Call this
- * routine if the result of a read was all 0xff's and you want to
- * find out if this is due to an EEH slot freeze.  This routine
- * will query firmware for the EEH status.
- *
- * Returns 0 if there has not been an EEH error; otherwise returns
- * a non-zero value and queues up a slot isolation event notification.
- *
- * It is safe to call this routine in an interrupt context.
- */
-int eeh_dev_check_failure(struct eeh_dev *edev)
-{
-	int ret;
-	unsigned long flags;
-	struct device_node *dn;
-	struct pci_dev *dev;
-	struct eeh_pe *pe;
-	int rc = 0;
-	const char *location;
-
-	eeh_stats.total_mmio_ffs++;
-
-	if (!eeh_subsystem_enabled)
-		return 0;
-
-	if (!edev) {
-		eeh_stats.no_dn++;
-		return 0;
-	}
-	dn = eeh_dev_to_of_node(edev);
-	dev = eeh_dev_to_pci_dev(edev);
-	pe = edev->pe;
-
-	/* Access to IO BARs might get this far and still not want checking. */
-	if (!pe) {
-		eeh_stats.ignored_check++;
-		pr_debug("EEH: Ignored check for %s %s\n",
-			eeh_pci_name(dev), dn->full_name);
-		return 0;
-	}
-
-	if (!pe->addr && !pe->config_addr) {
-		eeh_stats.no_cfg_addr++;
-		return 0;
-	}
-
-	/* If we already have a pending isolation event for this
-	 * slot, we know it's bad already, we don't need to check.
-	 * Do this checking under a lock; as multiple PCI devices
-	 * in one slot might report errors simultaneously, and we
-	 * only want one error recovery routine running.
-	 */
-	raw_spin_lock_irqsave(&confirm_error_lock, flags);
-	rc = 1;
-	if (pe->state & EEH_PE_ISOLATED) {
-		pe->check_count++;
-		if (pe->check_count % EEH_MAX_FAILS == 0) {
-			location = of_get_property(dn, "ibm,loc-code", NULL);
-			printk(KERN_ERR "EEH: %d reads ignored for recovering device at "
-				"location=%s driver=%s pci addr=%s\n",
-				pe->check_count, location,
-				eeh_driver_name(dev), eeh_pci_name(dev));
-			printk(KERN_ERR "EEH: Might be infinite loop in %s driver\n",
-				eeh_driver_name(dev));
-			dump_stack();
-		}
-		goto dn_unlock;
-	}
-
-	/*
-	 * Now test for an EEH failure.  This is VERY expensive.
-	 * Note that the eeh_config_addr may be a parent device
-	 * in the case of a device behind a bridge, or it may be
-	 * function zero of a multi-function device.
-	 * In any case they must share a common PHB.
-	 */
-	ret = eeh_ops->get_state(pe, NULL);
-
-	/* Note that config-io to empty slots may fail;
-	 * they are empty when they don't have children.
-	 * We will punt with the following conditions: Failure to get
-	 * PE's state, EEH not support and Permanently unavailable
-	 * state, PE is in good state.
-	 */
-	if ((ret < 0) ||
-	    (ret == EEH_STATE_NOT_SUPPORT) ||
-	    (ret & (EEH_STATE_MMIO_ACTIVE | EEH_STATE_DMA_ACTIVE)) ==
-	    (EEH_STATE_MMIO_ACTIVE | EEH_STATE_DMA_ACTIVE)) {
-		eeh_stats.false_positives++;
-		pe->false_positives++;
-		rc = 0;
-		goto dn_unlock;
-	}
-
-	eeh_stats.slot_resets++;
- 
-	/* Avoid repeated reports of this failure, including problems
-	 * with other functions on this device, and functions under
-	 * bridges.
-	 */
-	eeh_pe_state_mark(pe, EEH_PE_ISOLATED);
-	raw_spin_unlock_irqrestore(&confirm_error_lock, flags);
-
-	eeh_send_failure_event(pe);
-
-	/* Most EEH events are due to device driver bugs.  Having
-	 * a stack trace will help the device-driver authors figure
-	 * out what happened.  So print that out.
-	 */
-	WARN(1, "EEH: failure detected\n");
-	return 1;
-
-dn_unlock:
-	raw_spin_unlock_irqrestore(&confirm_error_lock, flags);
-	return rc;
-}
-
-EXPORT_SYMBOL_GPL(eeh_dev_check_failure);
-
-/**
- * eeh_check_failure - Check if all 1's data is due to EEH slot freeze
- * @token: I/O token, should be address in the form 0xA....
- * @val: value, should be all 1's (XXX why do we need this arg??)
- *
- * Check for an EEH failure at the given token address.  Call this
- * routine if the result of a read was all 0xff's and you want to
- * find out if this is due to an EEH slot freeze event.  This routine
- * will query firmware for the EEH status.
- *
- * Note this routine is safe to call in an interrupt context.
- */
-unsigned long eeh_check_failure(const volatile void __iomem *token, unsigned long val)
-{
-	unsigned long addr;
-	struct eeh_dev *edev;
-
-	/* Finding the phys addr + pci device; this is pretty quick. */
-	addr = eeh_token_to_phys((unsigned long __force) token);
-	edev = eeh_addr_cache_get_dev(addr);
-	if (!edev) {
-		eeh_stats.no_device++;
-		return val;
-	}
-
-	eeh_dev_check_failure(edev);
-
-	pci_dev_put(eeh_dev_to_pci_dev(edev));
-	return val;
-}
-
-EXPORT_SYMBOL(eeh_check_failure);
-
-
-/**
- * eeh_pci_enable - Enable MMIO or DMA transfers for this slot
- * @pe: EEH PE
- *
- * This routine should be called to reenable frozen MMIO or DMA
- * so that it would work correctly again. It's useful while doing
- * recovery or log collection on the indicated device.
- */
-int eeh_pci_enable(struct eeh_pe *pe, int function)
-{
-	int rc;
-
-	rc = eeh_ops->set_option(pe, function);
-	if (rc)
-		pr_warning("%s: Unexpected state change %d on PHB#%d-PE#%x, err=%d\n",
-			__func__, function, pe->phb->global_number, pe->addr, rc);
-
-	rc = eeh_ops->wait_state(pe, PCI_BUS_RESET_WAIT_MSEC);
-	if (rc > 0 && (rc & EEH_STATE_MMIO_ENABLED) &&
-	   (function == EEH_OPT_THAW_MMIO))
-		return 0;
-
-	return rc;
-}
-
-/**
- * pcibios_set_pcie_slot_reset - Set PCI-E reset state
- * @dev: pci device struct
- * @state: reset state to enter
- *
- * Return value:
- * 	0 if success
- */
-int pcibios_set_pcie_reset_state(struct pci_dev *dev, enum pcie_reset_state state)
-{
-	struct eeh_dev *edev = pci_dev_to_eeh_dev(dev);
-	struct eeh_pe *pe = edev->pe;
-
-	if (!pe) {
-		pr_err("%s: No PE found on PCI device %s\n",
-			__func__, pci_name(dev));
-		return -EINVAL;
-	}
-
-	switch (state) {
-	case pcie_deassert_reset:
-		eeh_ops->reset(pe, EEH_RESET_DEACTIVATE);
-		break;
-	case pcie_hot_reset:
-		eeh_ops->reset(pe, EEH_RESET_HOT);
-		break;
-	case pcie_warm_reset:
-		eeh_ops->reset(pe, EEH_RESET_FUNDAMENTAL);
-		break;
-	default:
-		return -EINVAL;
-	};
-
-	return 0;
-}
-
-/**
- * eeh_set_pe_freset - Check the required reset for the indicated device
- * @data: EEH device
- * @flag: return value
- *
- * Each device might have its preferred reset type: fundamental or
- * hot reset. The routine is used to collected the information for
- * the indicated device and its children so that the bunch of the
- * devices could be reset properly.
- */
-static void *eeh_set_dev_freset(void *data, void *flag)
-{
-	struct pci_dev *dev;
-	unsigned int *freset = (unsigned int *)flag;
-	struct eeh_dev *edev = (struct eeh_dev *)data;
-
-	dev = eeh_dev_to_pci_dev(edev);
-	if (dev)
-		*freset |= dev->needs_freset;
-
-	return NULL;
-}
-
-/**
- * eeh_reset_pe_once - Assert the pci #RST line for 1/4 second
- * @pe: EEH PE
- *
- * Assert the PCI #RST line for 1/4 second.
- */
-static void eeh_reset_pe_once(struct eeh_pe *pe)
-{
-	unsigned int freset = 0;
-
-	/* Determine type of EEH reset required for
-	 * Partitionable Endpoint, a hot-reset (1)
-	 * or a fundamental reset (3).
-	 * A fundamental reset required by any device under
-	 * Partitionable Endpoint trumps hot-reset.
-  	 */
-	eeh_pe_dev_traverse(pe, eeh_set_dev_freset, &freset);
-
-	if (freset)
-		eeh_ops->reset(pe, EEH_RESET_FUNDAMENTAL);
-	else
-		eeh_ops->reset(pe, EEH_RESET_HOT);
-
-	/* The PCI bus requires that the reset be held high for at least
-	 * a 100 milliseconds. We wait a bit longer 'just in case'.
-	 */
-#define PCI_BUS_RST_HOLD_TIME_MSEC 250
-	msleep(PCI_BUS_RST_HOLD_TIME_MSEC);
-	
-	/* We might get hit with another EEH freeze as soon as the 
-	 * pci slot reset line is dropped. Make sure we don't miss
-	 * these, and clear the flag now.
-	 */
-	eeh_pe_state_clear(pe, EEH_PE_ISOLATED);
-
-	eeh_ops->reset(pe, EEH_RESET_DEACTIVATE);
-
-	/* After a PCI slot has been reset, the PCI Express spec requires
-	 * a 1.5 second idle time for the bus to stabilize, before starting
-	 * up traffic.
-	 */
-#define PCI_BUS_SETTLE_TIME_MSEC 1800
-	msleep(PCI_BUS_SETTLE_TIME_MSEC);
-}
-
-/**
- * eeh_reset_pe - Reset the indicated PE
- * @pe: EEH PE
- *
- * This routine should be called to reset indicated device, including
- * PE. A PE might include multiple PCI devices and sometimes PCI bridges
- * might be involved as well.
- */
-int eeh_reset_pe(struct eeh_pe *pe)
-{
-	int i, rc;
-
-	/* Take three shots at resetting the bus */
-	for (i=0; i<3; i++) {
-		eeh_reset_pe_once(pe);
-
-		rc = eeh_ops->wait_state(pe, PCI_BUS_RESET_WAIT_MSEC);
-		if (rc == (EEH_STATE_MMIO_ACTIVE | EEH_STATE_DMA_ACTIVE))
-			return 0;
-
-		if (rc < 0) {
-			pr_err("%s: Unrecoverable slot failure on PHB#%d-PE#%x",
-				__func__, pe->phb->global_number, pe->addr);
-			return -1;
-		}
-		pr_err("EEH: bus reset %d failed on PHB#%d-PE#%x, rc=%d\n",
-			i+1, pe->phb->global_number, pe->addr, rc);
-	}
-
-	return -1;
-}
-
-/**
- * eeh_save_bars - Save device bars
- * @edev: PCI device associated EEH device
- *
- * Save the values of the device bars. Unlike the restore
- * routine, this routine is *not* recursive. This is because
- * PCI devices are added individually; but, for the restore,
- * an entire slot is reset at a time.
- */
-void eeh_save_bars(struct eeh_dev *edev)
-{
-	int i;
-	struct device_node *dn;
-
-	if (!edev)
-		return;
-	dn = eeh_dev_to_of_node(edev);
-	
-	for (i = 0; i < 16; i++)
-		eeh_ops->read_config(dn, i * 4, 4, &edev->config_space[i]);
-}
-
-/**
- * eeh_ops_register - Register platform dependent EEH operations
- * @ops: platform dependent EEH operations
- *
- * Register the platform dependent EEH operation callback
- * functions. The platform should call this function before
- * any other EEH operations.
- */
-int __init eeh_ops_register(struct eeh_ops *ops)
-{
-	if (!ops->name) {
-		pr_warning("%s: Invalid EEH ops name for %p\n",
-			__func__, ops);
-		return -EINVAL;
-	}
-
-	if (eeh_ops && eeh_ops != ops) {
-		pr_warning("%s: EEH ops of platform %s already existing (%s)\n",
-			__func__, eeh_ops->name, ops->name);
-		return -EEXIST;
-	}
-
-	eeh_ops = ops;
-
-	return 0;
-}
-
-/**
- * eeh_ops_unregister - Unreigster platform dependent EEH operations
- * @name: name of EEH platform operations
- *
- * Unregister the platform dependent EEH operation callback
- * functions.
- */
-int __exit eeh_ops_unregister(const char *name)
-{
-	if (!name || !strlen(name)) {
-		pr_warning("%s: Invalid EEH ops name\n",
-			__func__);
-		return -EINVAL;
-	}
-
-	if (eeh_ops && !strcmp(eeh_ops->name, name)) {
-		eeh_ops = NULL;
-		return 0;
-	}
-
-	return -EEXIST;
-}
-
-/**
- * eeh_init - EEH initialization
- *
- * Initialize EEH by trying to enable it for all of the adapters in the system.
- * As a side effect we can determine here if eeh is supported at all.
- * Note that we leave EEH on so failed config cycles won't cause a machine
- * check.  If a user turns off EEH for a particular adapter they are really
- * telling Linux to ignore errors.  Some hardware (e.g. POWER5) won't
- * grant access to a slot if EEH isn't enabled, and so we always enable
- * EEH for all slots/all devices.
- *
- * The eeh-force-off option disables EEH checking globally, for all slots.
- * Even if force-off is set, the EEH hardware is still enabled, so that
- * newer systems can boot.
- */
-static int __init eeh_init(void)
-{
-	struct pci_controller *hose, *tmp;
-	struct device_node *phb;
-	int ret;
-
-	/* call platform initialization function */
-	if (!eeh_ops) {
-		pr_warning("%s: Platform EEH operation not found\n",
-			__func__);
-		return -EEXIST;
-	} else if ((ret = eeh_ops->init())) {
-		pr_warning("%s: Failed to call platform init function (%d)\n",
-			__func__, ret);
-		return ret;
-	}
-
-	raw_spin_lock_init(&confirm_error_lock);
-
-	/* Enable EEH for all adapters */
-	if (eeh_probe_mode_devtree()) {
-		list_for_each_entry_safe(hose, tmp,
-			&hose_list, list_node) {
-			phb = hose->dn;
-			traverse_pci_devices(phb, eeh_ops->of_probe, NULL);
-		}
-	}
-
-	if (eeh_subsystem_enabled)
-		pr_info("EEH: PCI Enhanced I/O Error Handling Enabled\n");
-	else
-		pr_warning("EEH: No capable adapters found\n");
-
-	return ret;
-}
-
-core_initcall_sync(eeh_init);
-
-/**
- * eeh_add_device_early - Enable EEH for the indicated device_node
- * @dn: device node for which to set up EEH
- *
- * This routine must be used to perform EEH initialization for PCI
- * devices that were added after system boot (e.g. hotplug, dlpar).
- * This routine must be called before any i/o is performed to the
- * adapter (inluding any config-space i/o).
- * Whether this actually enables EEH or not for this device depends
- * on the CEC architecture, type of the device, on earlier boot
- * command-line arguments & etc.
- */
-static void eeh_add_device_early(struct device_node *dn)
-{
-	struct pci_controller *phb;
-
-	if (!of_node_to_eeh_dev(dn))
-		return;
-	phb = of_node_to_eeh_dev(dn)->phb;
-
-	/* USB Bus children of PCI devices will not have BUID's */
-	if (NULL == phb || 0 == phb->buid)
-		return;
-
-	/* FIXME: hotplug support on POWERNV */
-	eeh_ops->of_probe(dn, NULL);
-}
-
-/**
- * eeh_add_device_tree_early - Enable EEH for the indicated device
- * @dn: device node
- *
- * This routine must be used to perform EEH initialization for the
- * indicated PCI device that was added after system boot (e.g.
- * hotplug, dlpar).
- */
-void eeh_add_device_tree_early(struct device_node *dn)
-{
-	struct device_node *sib;
-
-	for_each_child_of_node(dn, sib)
-		eeh_add_device_tree_early(sib);
-	eeh_add_device_early(dn);
-}
-EXPORT_SYMBOL_GPL(eeh_add_device_tree_early);
-
-/**
- * eeh_add_device_late - Perform EEH initialization for the indicated pci device
- * @dev: pci device for which to set up EEH
- *
- * This routine must be used to complete EEH initialization for PCI
- * devices that were added after system boot (e.g. hotplug, dlpar).
- */
-static void eeh_add_device_late(struct pci_dev *dev)
-{
-	struct device_node *dn;
-	struct eeh_dev *edev;
-
-	if (!dev || !eeh_subsystem_enabled)
-		return;
-
-	pr_debug("EEH: Adding device %s\n", pci_name(dev));
-
-	dn = pci_device_to_OF_node(dev);
-	edev = of_node_to_eeh_dev(dn);
-	if (edev->pdev == dev) {
-		pr_debug("EEH: Already referenced !\n");
-		return;
-	}
-	WARN_ON(edev->pdev);
-
-	pci_dev_get(dev);
-	edev->pdev = dev;
-	dev->dev.archdata.edev = edev;
-
-	eeh_addr_cache_insert_dev(dev);
-}
-
-/**
- * eeh_add_device_tree_late - Perform EEH initialization for the indicated PCI bus
- * @bus: PCI bus
- *
- * This routine must be used to perform EEH initialization for PCI
- * devices which are attached to the indicated PCI bus. The PCI bus
- * is added after system boot through hotplug or dlpar.
- */
-void eeh_add_device_tree_late(struct pci_bus *bus)
-{
-	struct pci_dev *dev;
-
-	list_for_each_entry(dev, &bus->devices, bus_list) {
- 		eeh_add_device_late(dev);
- 		if (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE) {
- 			struct pci_bus *subbus = dev->subordinate;
- 			if (subbus)
- 				eeh_add_device_tree_late(subbus);
- 		}
-	}
-}
-EXPORT_SYMBOL_GPL(eeh_add_device_tree_late);
-
-/**
- * eeh_add_sysfs_files - Add EEH sysfs files for the indicated PCI bus
- * @bus: PCI bus
- *
- * This routine must be used to add EEH sysfs files for PCI
- * devices which are attached to the indicated PCI bus. The PCI bus
- * is added after system boot through hotplug or dlpar.
- */
-void eeh_add_sysfs_files(struct pci_bus *bus)
-{
-	struct pci_dev *dev;
-
-	list_for_each_entry(dev, &bus->devices, bus_list) {
-		eeh_sysfs_add_device(dev);
-		if (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE) {
-			struct pci_bus *subbus = dev->subordinate;
-			if (subbus)
-				eeh_add_sysfs_files(subbus);
-		}
-	}
-}
-EXPORT_SYMBOL_GPL(eeh_add_sysfs_files);
-
-/**
- * eeh_remove_device - Undo EEH setup for the indicated pci device
- * @dev: pci device to be removed
- * @purge_pe: remove the PE or not
- *
- * This routine should be called when a device is removed from
- * a running system (e.g. by hotplug or dlpar).  It unregisters
- * the PCI device from the EEH subsystem.  I/O errors affecting
- * this device will no longer be detected after this call; thus,
- * i/o errors affecting this slot may leave this device unusable.
- */
-static void eeh_remove_device(struct pci_dev *dev, int purge_pe)
-{
-	struct eeh_dev *edev;
-
-	if (!dev || !eeh_subsystem_enabled)
-		return;
-	edev = pci_dev_to_eeh_dev(dev);
-
-	/* Unregister the device with the EEH/PCI address search system */
-	pr_debug("EEH: Removing device %s\n", pci_name(dev));
-
-	if (!edev || !edev->pdev) {
-		pr_debug("EEH: Not referenced !\n");
-		return;
-	}
-	edev->pdev = NULL;
-	dev->dev.archdata.edev = NULL;
-	pci_dev_put(dev);
-
-	eeh_rmv_from_parent_pe(edev, purge_pe);
-	eeh_addr_cache_rmv_dev(dev);
-	eeh_sysfs_remove_device(dev);
-}
-
-/**
- * eeh_remove_bus_device - Undo EEH setup for the indicated PCI device
- * @dev: PCI device
- * @purge_pe: remove the corresponding PE or not
- *
- * This routine must be called when a device is removed from the
- * running system through hotplug or dlpar. The corresponding
- * PCI address cache will be removed.
- */
-void eeh_remove_bus_device(struct pci_dev *dev, int purge_pe)
-{
-	struct pci_bus *bus = dev->subordinate;
-	struct pci_dev *child, *tmp;
-
-	eeh_remove_device(dev, purge_pe);
-
-	if (bus && dev->hdr_type == PCI_HEADER_TYPE_BRIDGE) {
-		list_for_each_entry_safe(child, tmp, &bus->devices, bus_list)
-			 eeh_remove_bus_device(child, purge_pe);
-	}
-}
-EXPORT_SYMBOL_GPL(eeh_remove_bus_device);
-
-static int proc_eeh_show(struct seq_file *m, void *v)
-{
-	if (0 == eeh_subsystem_enabled) {
-		seq_printf(m, "EEH Subsystem is globally disabled\n");
-		seq_printf(m, "eeh_total_mmio_ffs=%llu\n", eeh_stats.total_mmio_ffs);
-	} else {
-		seq_printf(m, "EEH Subsystem is enabled\n");
-		seq_printf(m,
-				"no device=%llu\n"
-				"no device node=%llu\n"
-				"no config address=%llu\n"
-				"check not wanted=%llu\n"
-				"eeh_total_mmio_ffs=%llu\n"
-				"eeh_false_positives=%llu\n"
-				"eeh_slot_resets=%llu\n",
-				eeh_stats.no_device,
-				eeh_stats.no_dn,
-				eeh_stats.no_cfg_addr,
-				eeh_stats.ignored_check,
-				eeh_stats.total_mmio_ffs,
-				eeh_stats.false_positives,
-				eeh_stats.slot_resets);
-	}
-
-	return 0;
-}
-
-static int proc_eeh_open(struct inode *inode, struct file *file)
-{
-	return single_open(file, proc_eeh_show, NULL);
-}
-
-static const struct file_operations proc_eeh_operations = {
-	.open      = proc_eeh_open,
-	.read      = seq_read,
-	.llseek    = seq_lseek,
-	.release   = single_release,
-};
-
-static int __init eeh_init_proc(void)
-{
-	if (machine_is(pseries))
-		proc_create("powerpc/eeh", 0, NULL, &proc_eeh_operations);
-	return 0;
-}
-__initcall(eeh_init_proc);
diff --git a/arch/powerpc/platforms/pseries/eeh_cache.c b/arch/powerpc/platforms/pseries/eeh_cache.c
deleted file mode 100644
index 5a4c879..0000000
--- a/arch/powerpc/platforms/pseries/eeh_cache.c
+++ /dev/null
@@ -1,319 +0,0 @@
-/*
- * PCI address cache; allows the lookup of PCI devices based on I/O address
- *
- * Copyright IBM Corporation 2004
- * Copyright Linas Vepstas <linas@austin.ibm.com> 2004
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License as published by
- * the Free Software Foundation; either version 2 of the License, or
- * (at your option) any later version.
- *
- * This program is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
- * GNU General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with this program; if not, write to the Free Software
- * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307 USA
- */
-
-#include <linux/list.h>
-#include <linux/pci.h>
-#include <linux/rbtree.h>
-#include <linux/slab.h>
-#include <linux/spinlock.h>
-#include <linux/atomic.h>
-#include <asm/pci-bridge.h>
-#include <asm/ppc-pci.h>
-
-
-/**
- * The pci address cache subsystem.  This subsystem places
- * PCI device address resources into a red-black tree, sorted
- * according to the address range, so that given only an i/o
- * address, the corresponding PCI device can be **quickly**
- * found. It is safe to perform an address lookup in an interrupt
- * context; this ability is an important feature.
- *
- * Currently, the only customer of this code is the EEH subsystem;
- * thus, this code has been somewhat tailored to suit EEH better.
- * In particular, the cache does *not* hold the addresses of devices
- * for which EEH is not enabled.
- *
- * (Implementation Note: The RB tree seems to be better/faster
- * than any hash algo I could think of for this problem, even
- * with the penalty of slow pointer chases for d-cache misses).
- */
-struct pci_io_addr_range {
-	struct rb_node rb_node;
-	unsigned long addr_lo;
-	unsigned long addr_hi;
-	struct eeh_dev *edev;
-	struct pci_dev *pcidev;
-	unsigned int flags;
-};
-
-static struct pci_io_addr_cache {
-	struct rb_root rb_root;
-	spinlock_t piar_lock;
-} pci_io_addr_cache_root;
-
-static inline struct eeh_dev *__eeh_addr_cache_get_device(unsigned long addr)
-{
-	struct rb_node *n = pci_io_addr_cache_root.rb_root.rb_node;
-
-	while (n) {
-		struct pci_io_addr_range *piar;
-		piar = rb_entry(n, struct pci_io_addr_range, rb_node);
-
-		if (addr < piar->addr_lo) {
-			n = n->rb_left;
-		} else {
-			if (addr > piar->addr_hi) {
-				n = n->rb_right;
-			} else {
-				pci_dev_get(piar->pcidev);
-				return piar->edev;
-			}
-		}
-	}
-
-	return NULL;
-}
-
-/**
- * eeh_addr_cache_get_dev - Get device, given only address
- * @addr: mmio (PIO) phys address or i/o port number
- *
- * Given an mmio phys address, or a port number, find a pci device
- * that implements this address.  Be sure to pci_dev_put the device
- * when finished.  I/O port numbers are assumed to be offset
- * from zero (that is, they do *not* have pci_io_addr added in).
- * It is safe to call this function within an interrupt.
- */
-struct eeh_dev *eeh_addr_cache_get_dev(unsigned long addr)
-{
-	struct eeh_dev *edev;
-	unsigned long flags;
-
-	spin_lock_irqsave(&pci_io_addr_cache_root.piar_lock, flags);
-	edev = __eeh_addr_cache_get_device(addr);
-	spin_unlock_irqrestore(&pci_io_addr_cache_root.piar_lock, flags);
-	return edev;
-}
-
-#ifdef DEBUG
-/*
- * Handy-dandy debug print routine, does nothing more
- * than print out the contents of our addr cache.
- */
-static void eeh_addr_cache_print(struct pci_io_addr_cache *cache)
-{
-	struct rb_node *n;
-	int cnt = 0;
-
-	n = rb_first(&cache->rb_root);
-	while (n) {
-		struct pci_io_addr_range *piar;
-		piar = rb_entry(n, struct pci_io_addr_range, rb_node);
-		pr_debug("PCI: %s addr range %d [%lx-%lx]: %s\n",
-		       (piar->flags & IORESOURCE_IO) ? "i/o" : "mem", cnt,
-		       piar->addr_lo, piar->addr_hi, pci_name(piar->pcidev));
-		cnt++;
-		n = rb_next(n);
-	}
-}
-#endif
-
-/* Insert address range into the rb tree. */
-static struct pci_io_addr_range *
-eeh_addr_cache_insert(struct pci_dev *dev, unsigned long alo,
-		      unsigned long ahi, unsigned int flags)
-{
-	struct rb_node **p = &pci_io_addr_cache_root.rb_root.rb_node;
-	struct rb_node *parent = NULL;
-	struct pci_io_addr_range *piar;
-
-	/* Walk tree, find a place to insert into tree */
-	while (*p) {
-		parent = *p;
-		piar = rb_entry(parent, struct pci_io_addr_range, rb_node);
-		if (ahi < piar->addr_lo) {
-			p = &parent->rb_left;
-		} else if (alo > piar->addr_hi) {
-			p = &parent->rb_right;
-		} else {
-			if (dev != piar->pcidev ||
-			    alo != piar->addr_lo || ahi != piar->addr_hi) {
-				pr_warning("PIAR: overlapping address range\n");
-			}
-			return piar;
-		}
-	}
-	piar = kzalloc(sizeof(struct pci_io_addr_range), GFP_ATOMIC);
-	if (!piar)
-		return NULL;
-
-	pci_dev_get(dev);
-	piar->addr_lo = alo;
-	piar->addr_hi = ahi;
-	piar->edev = pci_dev_to_eeh_dev(dev);
-	piar->pcidev = dev;
-	piar->flags = flags;
-
-#ifdef DEBUG
-	pr_debug("PIAR: insert range=[%lx:%lx] dev=%s\n",
-	                  alo, ahi, pci_name(dev));
-#endif
-
-	rb_link_node(&piar->rb_node, parent, p);
-	rb_insert_color(&piar->rb_node, &pci_io_addr_cache_root.rb_root);
-
-	return piar;
-}
-
-static void __eeh_addr_cache_insert_dev(struct pci_dev *dev)
-{
-	struct device_node *dn;
-	struct eeh_dev *edev;
-	int i;
-
-	dn = pci_device_to_OF_node(dev);
-	if (!dn) {
-		pr_warning("PCI: no pci dn found for dev=%s\n", pci_name(dev));
-		return;
-	}
-
-	edev = of_node_to_eeh_dev(dn);
-	if (!edev) {
-		pr_warning("PCI: no EEH dev found for dn=%s\n",
-			dn->full_name);
-		return;
-	}
-
-	/* Skip any devices for which EEH is not enabled. */
-	if (!edev->pe) {
-#ifdef DEBUG
-		pr_info("PCI: skip building address cache for=%s - %s\n",
-			pci_name(dev), dn->full_name);
-#endif
-		return;
-	}
-
-	/* Walk resources on this device, poke them into the tree */
-	for (i = 0; i < DEVICE_COUNT_RESOURCE; i++) {
-		unsigned long start = pci_resource_start(dev,i);
-		unsigned long end = pci_resource_end(dev,i);
-		unsigned int flags = pci_resource_flags(dev,i);
-
-		/* We are interested only bus addresses, not dma or other stuff */
-		if (0 == (flags & (IORESOURCE_IO | IORESOURCE_MEM)))
-			continue;
-		if (start == 0 || ~start == 0 || end == 0 || ~end == 0)
-			 continue;
-		eeh_addr_cache_insert(dev, start, end, flags);
-	}
-}
-
-/**
- * eeh_addr_cache_insert_dev - Add a device to the address cache
- * @dev: PCI device whose I/O addresses we are interested in.
- *
- * In order to support the fast lookup of devices based on addresses,
- * we maintain a cache of devices that can be quickly searched.
- * This routine adds a device to that cache.
- */
-void eeh_addr_cache_insert_dev(struct pci_dev *dev)
-{
-	unsigned long flags;
-
-	/* Ignore PCI bridges */
-	if ((dev->class >> 16) == PCI_BASE_CLASS_BRIDGE)
-		return;
-
-	spin_lock_irqsave(&pci_io_addr_cache_root.piar_lock, flags);
-	__eeh_addr_cache_insert_dev(dev);
-	spin_unlock_irqrestore(&pci_io_addr_cache_root.piar_lock, flags);
-}
-
-static inline void __eeh_addr_cache_rmv_dev(struct pci_dev *dev)
-{
-	struct rb_node *n;
-
-restart:
-	n = rb_first(&pci_io_addr_cache_root.rb_root);
-	while (n) {
-		struct pci_io_addr_range *piar;
-		piar = rb_entry(n, struct pci_io_addr_range, rb_node);
-
-		if (piar->pcidev == dev) {
-			rb_erase(n, &pci_io_addr_cache_root.rb_root);
-			pci_dev_put(piar->pcidev);
-			kfree(piar);
-			goto restart;
-		}
-		n = rb_next(n);
-	}
-}
-
-/**
- * eeh_addr_cache_rmv_dev - remove pci device from addr cache
- * @dev: device to remove
- *
- * Remove a device from the addr-cache tree.
- * This is potentially expensive, since it will walk
- * the tree multiple times (once per resource).
- * But so what; device removal doesn't need to be that fast.
- */
-void eeh_addr_cache_rmv_dev(struct pci_dev *dev)
-{
-	unsigned long flags;
-
-	spin_lock_irqsave(&pci_io_addr_cache_root.piar_lock, flags);
-	__eeh_addr_cache_rmv_dev(dev);
-	spin_unlock_irqrestore(&pci_io_addr_cache_root.piar_lock, flags);
-}
-
-/**
- * eeh_addr_cache_build - Build a cache of I/O addresses
- *
- * Build a cache of pci i/o addresses.  This cache will be used to
- * find the pci device that corresponds to a given address.
- * This routine scans all pci busses to build the cache.
- * Must be run late in boot process, after the pci controllers
- * have been scanned for devices (after all device resources are known).
- */
-void __init eeh_addr_cache_build(void)
-{
-	struct device_node *dn;
-	struct eeh_dev *edev;
-	struct pci_dev *dev = NULL;
-
-	spin_lock_init(&pci_io_addr_cache_root.piar_lock);
-
-	for_each_pci_dev(dev) {
-		eeh_addr_cache_insert_dev(dev);
-
-		dn = pci_device_to_OF_node(dev);
-		if (!dn)
-			continue;
-
-		edev = of_node_to_eeh_dev(dn);
-		if (!edev)
-			continue;
-
-		pci_dev_get(dev);  /* matching put is in eeh_remove_device() */
-		dev->dev.archdata.edev = edev;
-		edev->pdev = dev;
-
-		eeh_sysfs_add_device(dev);
-	}
-
-#ifdef DEBUG
-	/* Verify tree built up above, echo back the list of addrs. */
-	eeh_addr_cache_print(&pci_io_addr_cache_root);
-#endif
-}
-
diff --git a/arch/powerpc/platforms/pseries/eeh_dev.c b/arch/powerpc/platforms/pseries/eeh_dev.c
deleted file mode 100644
index 1efa28f..0000000
--- a/arch/powerpc/platforms/pseries/eeh_dev.c
+++ /dev/null
@@ -1,112 +0,0 @@
-/*
- * The file intends to implement dynamic creation of EEH device, which will
- * be bound with OF node and PCI device simutaneously. The EEH devices would
- * be foundamental information for EEH core components to work proerly. Besides,
- * We have to support multiple situations where dynamic creation of EEH device
- * is required:
- *
- * 1) Before PCI emunation starts, we need create EEH devices according to the
- *    PCI sensitive OF nodes.
- * 2) When PCI emunation is done, we need do the binding between PCI device and
- *    the associated EEH device.
- * 3) DR (Dynamic Reconfiguration) would create PCI sensitive OF node. EEH device
- *    will be created while PCI sensitive OF node is detected from DR.
- * 4) PCI hotplug needs redoing the binding between PCI device and EEH device. If
- *    PHB is newly inserted, we also need create EEH devices accordingly.
- *
- * Copyright Benjamin Herrenschmidt & Gavin Shan, IBM Corporation 2012.
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License as published by
- * the Free Software Foundation; either version 2 of the License, or
- * (at your option) any later version.
- *
- * This program is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
- * GNU General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with this program; if not, write to the Free Software
- * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307 USA
- */
-
-#include <linux/export.h>
-#include <linux/gfp.h>
-#include <linux/init.h>
-#include <linux/kernel.h>
-#include <linux/pci.h>
-#include <linux/string.h>
-
-#include <asm/pci-bridge.h>
-#include <asm/ppc-pci.h>
-
-/**
- * eeh_dev_init - Create EEH device according to OF node
- * @dn: device node
- * @data: PHB
- *
- * It will create EEH device according to the given OF node. The function
- * might be called by PCI emunation, DR, PHB hotplug.
- */
-void *eeh_dev_init(struct device_node *dn, void *data)
-{
-	struct pci_controller *phb = data;
-	struct eeh_dev *edev;
-
-	/* Allocate EEH device */
-	edev = kzalloc(sizeof(*edev), GFP_KERNEL);
-	if (!edev) {
-		pr_warning("%s: out of memory\n", __func__);
-		return NULL;
-	}
-
-	/* Associate EEH device with OF node */
-	PCI_DN(dn)->edev = edev;
-	edev->dn  = dn;
-	edev->phb = phb;
-	INIT_LIST_HEAD(&edev->list);
-
-	return NULL;
-}
-
-/**
- * eeh_dev_phb_init_dynamic - Create EEH devices for devices included in PHB
- * @phb: PHB
- *
- * Scan the PHB OF node and its child association, then create the
- * EEH devices accordingly
- */
-void eeh_dev_phb_init_dynamic(struct pci_controller *phb)
-{
-	struct device_node *dn = phb->dn;
-
-	/* EEH PE for PHB */
-	eeh_phb_pe_create(phb);
-
-	/* EEH device for PHB */
-	eeh_dev_init(dn, phb);
-
-	/* EEH devices for children OF nodes */
-	traverse_pci_devices(dn, eeh_dev_init, phb);
-}
-
-/**
- * eeh_dev_phb_init - Create EEH devices for devices included in existing PHBs
- *
- * Scan all the existing PHBs and create EEH devices for their OF
- * nodes and their children OF nodes
- */
-static int __init eeh_dev_phb_init(void)
-{
-	struct pci_controller *phb, *tmp;
-
-	list_for_each_entry_safe(phb, tmp, &hose_list, list_node)
-		eeh_dev_phb_init_dynamic(phb);
-
-	pr_info("EEH: devices created\n");
-
-	return 0;
-}
-
-core_initcall(eeh_dev_phb_init);
diff --git a/arch/powerpc/platforms/pseries/eeh_driver.c b/arch/powerpc/platforms/pseries/eeh_driver.c
deleted file mode 100644
index a3fefb6..0000000
--- a/arch/powerpc/platforms/pseries/eeh_driver.c
+++ /dev/null
@@ -1,552 +0,0 @@
-/*
- * PCI Error Recovery Driver for RPA-compliant PPC64 platform.
- * Copyright IBM Corp. 2004 2005
- * Copyright Linas Vepstas <linas@linas.org> 2004, 2005
- *
- * All rights reserved.
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License as published by
- * the Free Software Foundation; either version 2 of the License, or (at
- * your option) any later version.
- *
- * This program is distributed in the hope that it will be useful, but
- * WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE, GOOD TITLE or
- * NON INFRINGEMENT.  See the GNU General Public License for more
- * details.
- *
- * You should have received a copy of the GNU General Public License
- * along with this program; if not, write to the Free Software
- * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
- *
- * Send comments and feedback to Linas Vepstas <linas@austin.ibm.com>
- */
-#include <linux/delay.h>
-#include <linux/interrupt.h>
-#include <linux/irq.h>
-#include <linux/module.h>
-#include <linux/pci.h>
-#include <asm/eeh.h>
-#include <asm/eeh_event.h>
-#include <asm/ppc-pci.h>
-#include <asm/pci-bridge.h>
-#include <asm/prom.h>
-#include <asm/rtas.h>
-
-/**
- * eeh_pcid_name - Retrieve name of PCI device driver
- * @pdev: PCI device
- *
- * This routine is used to retrieve the name of PCI device driver
- * if that's valid.
- */
-static inline const char *eeh_pcid_name(struct pci_dev *pdev)
-{
-	if (pdev && pdev->dev.driver)
-		return pdev->dev.driver->name;
-	return "";
-}
-
-/**
- * eeh_pcid_get - Get the PCI device driver
- * @pdev: PCI device
- *
- * The function is used to retrieve the PCI device driver for
- * the indicated PCI device. Besides, we will increase the reference
- * of the PCI device driver to prevent that being unloaded on
- * the fly. Otherwise, kernel crash would be seen.
- */
-static inline struct pci_driver *eeh_pcid_get(struct pci_dev *pdev)
-{
-	if (!pdev || !pdev->driver)
-		return NULL;
-
-	if (!try_module_get(pdev->driver->driver.owner))
-		return NULL;
-
-	return pdev->driver;
-}
-
-/**
- * eeh_pcid_put - Dereference on the PCI device driver
- * @pdev: PCI device
- *
- * The function is called to do dereference on the PCI device
- * driver of the indicated PCI device.
- */
-static inline void eeh_pcid_put(struct pci_dev *pdev)
-{
-	if (!pdev || !pdev->driver)
-		return;
-
-	module_put(pdev->driver->driver.owner);
-}
-
-#if 0
-static void print_device_node_tree(struct pci_dn *pdn, int dent)
-{
-	int i;
-	struct device_node *pc;
-
-	if (!pdn)
-		return;
-	for (i = 0; i < dent; i++)
-		printk(" ");
-	printk("dn=%s mode=%x \tcfg_addr=%x pe_addr=%x \tfull=%s\n",
-		pdn->node->name, pdn->eeh_mode, pdn->eeh_config_addr,
-		pdn->eeh_pe_config_addr, pdn->node->full_name);
-	dent += 3;
-	pc = pdn->node->child;
-	while (pc) {
-		print_device_node_tree(PCI_DN(pc), dent);
-		pc = pc->sibling;
-	}
-}
-#endif
-
-/**
- * eeh_disable_irq - Disable interrupt for the recovering device
- * @dev: PCI device
- *
- * This routine must be called when reporting temporary or permanent
- * error to the particular PCI device to disable interrupt of that
- * device. If the device has enabled MSI or MSI-X interrupt, we needn't
- * do real work because EEH should freeze DMA transfers for those PCI
- * devices encountering EEH errors, which includes MSI or MSI-X.
- */
-static void eeh_disable_irq(struct pci_dev *dev)
-{
-	struct eeh_dev *edev = pci_dev_to_eeh_dev(dev);
-
-	/* Don't disable MSI and MSI-X interrupts. They are
-	 * effectively disabled by the DMA Stopped state
-	 * when an EEH error occurs.
-	 */
-	if (dev->msi_enabled || dev->msix_enabled)
-		return;
-
-	if (!irq_has_action(dev->irq))
-		return;
-
-	edev->mode |= EEH_DEV_IRQ_DISABLED;
-	disable_irq_nosync(dev->irq);
-}
-
-/**
- * eeh_enable_irq - Enable interrupt for the recovering device
- * @dev: PCI device
- *
- * This routine must be called to enable interrupt while failed
- * device could be resumed.
- */
-static void eeh_enable_irq(struct pci_dev *dev)
-{
-	struct eeh_dev *edev = pci_dev_to_eeh_dev(dev);
-
-	if ((edev->mode) & EEH_DEV_IRQ_DISABLED) {
-		edev->mode &= ~EEH_DEV_IRQ_DISABLED;
-		enable_irq(dev->irq);
-	}
-}
-
-/**
- * eeh_report_error - Report pci error to each device driver
- * @data: eeh device
- * @userdata: return value
- * 
- * Report an EEH error to each device driver, collect up and 
- * merge the device driver responses. Cumulative response 
- * passed back in "userdata".
- */
-static void *eeh_report_error(void *data, void *userdata)
-{
-	struct eeh_dev *edev = (struct eeh_dev *)data;
-	struct pci_dev *dev = eeh_dev_to_pci_dev(edev);
-	enum pci_ers_result rc, *res = userdata;
-	struct pci_driver *driver;
-
-	/* We might not have the associated PCI device,
-	 * then we should continue for next one.
-	 */
-	if (!dev) return NULL;
-	dev->error_state = pci_channel_io_frozen;
-
-	driver = eeh_pcid_get(dev);
-	if (!driver) return NULL;
-
-	eeh_disable_irq(dev);
-
-	if (!driver->err_handler ||
-	    !driver->err_handler->error_detected) {
-		eeh_pcid_put(dev);
-		return NULL;
-	}
-
-	rc = driver->err_handler->error_detected(dev, pci_channel_io_frozen);
-
-	/* A driver that needs a reset trumps all others */
-	if (rc == PCI_ERS_RESULT_NEED_RESET) *res = rc;
-	if (*res == PCI_ERS_RESULT_NONE) *res = rc;
-
-	eeh_pcid_put(dev);
-	return NULL;
-}
-
-/**
- * eeh_report_mmio_enabled - Tell drivers that MMIO has been enabled
- * @data: eeh device
- * @userdata: return value
- *
- * Tells each device driver that IO ports, MMIO and config space I/O
- * are now enabled. Collects up and merges the device driver responses.
- * Cumulative response passed back in "userdata".
- */
-static void *eeh_report_mmio_enabled(void *data, void *userdata)
-{
-	struct eeh_dev *edev = (struct eeh_dev *)data;
-	struct pci_dev *dev = eeh_dev_to_pci_dev(edev);
-	enum pci_ers_result rc, *res = userdata;
-	struct pci_driver *driver;
-
-	driver = eeh_pcid_get(dev);
-	if (!driver) return NULL;
-
-	if (!driver->err_handler ||
-	    !driver->err_handler->mmio_enabled) {
-		eeh_pcid_put(dev);
-		return NULL;
-	}
-
-	rc = driver->err_handler->mmio_enabled(dev);
-
-	/* A driver that needs a reset trumps all others */
-	if (rc == PCI_ERS_RESULT_NEED_RESET) *res = rc;
-	if (*res == PCI_ERS_RESULT_NONE) *res = rc;
-
-	eeh_pcid_put(dev);
-	return NULL;
-}
-
-/**
- * eeh_report_reset - Tell device that slot has been reset
- * @data: eeh device
- * @userdata: return value
- *
- * This routine must be called while EEH tries to reset particular
- * PCI device so that the associated PCI device driver could take
- * some actions, usually to save data the driver needs so that the
- * driver can work again while the device is recovered.
- */
-static void *eeh_report_reset(void *data, void *userdata)
-{
-	struct eeh_dev *edev = (struct eeh_dev *)data;
-	struct pci_dev *dev = eeh_dev_to_pci_dev(edev);
-	enum pci_ers_result rc, *res = userdata;
-	struct pci_driver *driver;
-
-	if (!dev) return NULL;
-	dev->error_state = pci_channel_io_normal;
-
-	driver = eeh_pcid_get(dev);
-	if (!driver) return NULL;
-
-	eeh_enable_irq(dev);
-
-	if (!driver->err_handler ||
-	    !driver->err_handler->slot_reset) {
-		eeh_pcid_put(dev);
-		return NULL;
-	}
-
-	rc = driver->err_handler->slot_reset(dev);
-	if ((*res == PCI_ERS_RESULT_NONE) ||
-	    (*res == PCI_ERS_RESULT_RECOVERED)) *res = rc;
-	if (*res == PCI_ERS_RESULT_DISCONNECT &&
-	     rc == PCI_ERS_RESULT_NEED_RESET) *res = rc;
-
-	eeh_pcid_put(dev);
-	return NULL;
-}
-
-/**
- * eeh_report_resume - Tell device to resume normal operations
- * @data: eeh device
- * @userdata: return value
- *
- * This routine must be called to notify the device driver that it
- * could resume so that the device driver can do some initialization
- * to make the recovered device work again.
- */
-static void *eeh_report_resume(void *data, void *userdata)
-{
-	struct eeh_dev *edev = (struct eeh_dev *)data;
-	struct pci_dev *dev = eeh_dev_to_pci_dev(edev);
-	struct pci_driver *driver;
-
-	if (!dev) return NULL;
-	dev->error_state = pci_channel_io_normal;
-
-	driver = eeh_pcid_get(dev);
-	if (!driver) return NULL;
-
-	eeh_enable_irq(dev);
-
-	if (!driver->err_handler ||
-	    !driver->err_handler->resume) {
-		eeh_pcid_put(dev);
-		return NULL;
-	}
-
-	driver->err_handler->resume(dev);
-
-	eeh_pcid_put(dev);
-	return NULL;
-}
-
-/**
- * eeh_report_failure - Tell device driver that device is dead.
- * @data: eeh device
- * @userdata: return value
- *
- * This informs the device driver that the device is permanently
- * dead, and that no further recovery attempts will be made on it.
- */
-static void *eeh_report_failure(void *data, void *userdata)
-{
-	struct eeh_dev *edev = (struct eeh_dev *)data;
-	struct pci_dev *dev = eeh_dev_to_pci_dev(edev);
-	struct pci_driver *driver;
-
-	if (!dev) return NULL;
-	dev->error_state = pci_channel_io_perm_failure;
-
-	driver = eeh_pcid_get(dev);
-	if (!driver) return NULL;
-
-	eeh_disable_irq(dev);
-
-	if (!driver->err_handler ||
-	    !driver->err_handler->error_detected) {
-		eeh_pcid_put(dev);
-		return NULL;
-	}
-
-	driver->err_handler->error_detected(dev, pci_channel_io_perm_failure);
-
-	eeh_pcid_put(dev);
-	return NULL;
-}
-
-/**
- * eeh_reset_device - Perform actual reset of a pci slot
- * @pe: EEH PE
- * @bus: PCI bus corresponding to the isolcated slot
- *
- * This routine must be called to do reset on the indicated PE.
- * During the reset, udev might be invoked because those affected
- * PCI devices will be removed and then added.
- */
-static int eeh_reset_device(struct eeh_pe *pe, struct pci_bus *bus)
-{
-	int cnt, rc;
-
-	/* pcibios will clear the counter; save the value */
-	cnt = pe->freeze_count;
-
-	/*
-	 * We don't remove the corresponding PE instances because
-	 * we need the information afterwords. The attached EEH
-	 * devices are expected to be attached soon when calling
-	 * into pcibios_add_pci_devices().
-	 */
-	if (bus)
-		__pcibios_remove_pci_devices(bus, 0);
-
-	/* Reset the pci controller. (Asserts RST#; resets config space).
-	 * Reconfigure bridges and devices. Don't try to bring the system
-	 * up if the reset failed for some reason.
-	 */
-	rc = eeh_reset_pe(pe);
-	if (rc)
-		return rc;
-
-	/* Restore PE */
-	eeh_ops->configure_bridge(pe);
-	eeh_pe_restore_bars(pe);
-
-	/* Give the system 5 seconds to finish running the user-space
-	 * hotplug shutdown scripts, e.g. ifdown for ethernet.  Yes, 
-	 * this is a hack, but if we don't do this, and try to bring 
-	 * the device up before the scripts have taken it down, 
-	 * potentially weird things happen.
-	 */
-	if (bus) {
-		ssleep(5);
-		pcibios_add_pci_devices(bus);
-	}
-	pe->freeze_count = cnt;
-
-	return 0;
-}
-
-/* The longest amount of time to wait for a pci device
- * to come back on line, in seconds.
- */
-#define MAX_WAIT_FOR_RECOVERY 150
-
-/**
- * eeh_handle_event - Reset a PCI device after hard lockup.
- * @pe: EEH PE
- *
- * While PHB detects address or data parity errors on particular PCI
- * slot, the associated PE will be frozen. Besides, DMA's occurring
- * to wild addresses (which usually happen due to bugs in device
- * drivers or in PCI adapter firmware) can cause EEH error. #SERR,
- * #PERR or other misc PCI-related errors also can trigger EEH errors.
- *
- * Recovery process consists of unplugging the device driver (which
- * generated hotplug events to userspace), then issuing a PCI #RST to
- * the device, then reconfiguring the PCI config space for all bridges
- * & devices under this slot, and then finally restarting the device
- * drivers (which cause a second set of hotplug events to go out to
- * userspace).
- */
-void eeh_handle_event(struct eeh_pe *pe)
-{
-	struct pci_bus *frozen_bus;
-	int rc = 0;
-	enum pci_ers_result result = PCI_ERS_RESULT_NONE;
-
-	frozen_bus = eeh_pe_bus_get(pe);
-	if (!frozen_bus) {
-		pr_err("%s: Cannot find PCI bus for PHB#%d-PE#%x\n",
-			__func__, pe->phb->global_number, pe->addr);
-		return;
-	}
-
-	pe->freeze_count++;
-	if (pe->freeze_count > EEH_MAX_ALLOWED_FREEZES)
-		goto excess_failures;
-	pr_warning("EEH: This PCI device has failed %d times in the last hour\n",
-		pe->freeze_count);
-
-	/* Walk the various device drivers attached to this slot through
-	 * a reset sequence, giving each an opportunity to do what it needs
-	 * to accomplish the reset.  Each child gets a report of the
-	 * status ... if any child can't handle the reset, then the entire
-	 * slot is dlpar removed and added.
-	 */
-	eeh_pe_dev_traverse(pe, eeh_report_error, &result);
-
-	/* Get the current PCI slot state. This can take a long time,
-	 * sometimes over 3 seconds for certain systems.
-	 */
-	rc = eeh_ops->wait_state(pe, MAX_WAIT_FOR_RECOVERY*1000);
-	if (rc < 0 || rc == EEH_STATE_NOT_SUPPORT) {
-		printk(KERN_WARNING "EEH: Permanent failure\n");
-		goto hard_fail;
-	}
-
-	/* Since rtas may enable MMIO when posting the error log,
-	 * don't post the error log until after all dev drivers
-	 * have been informed.
-	 */
-	eeh_slot_error_detail(pe, EEH_LOG_TEMP);
-
-	/* If all device drivers were EEH-unaware, then shut
-	 * down all of the device drivers, and hope they
-	 * go down willingly, without panicing the system.
-	 */
-	if (result == PCI_ERS_RESULT_NONE) {
-		rc = eeh_reset_device(pe, frozen_bus);
-		if (rc) {
-			printk(KERN_WARNING "EEH: Unable to reset, rc=%d\n", rc);
-			goto hard_fail;
-		}
-	}
-
-	/* If all devices reported they can proceed, then re-enable MMIO */
-	if (result == PCI_ERS_RESULT_CAN_RECOVER) {
-		rc = eeh_pci_enable(pe, EEH_OPT_THAW_MMIO);
-
-		if (rc < 0)
-			goto hard_fail;
-		if (rc) {
-			result = PCI_ERS_RESULT_NEED_RESET;
-		} else {
-			result = PCI_ERS_RESULT_NONE;
-			eeh_pe_dev_traverse(pe, eeh_report_mmio_enabled, &result);
-		}
-	}
-
-	/* If all devices reported they can proceed, then re-enable DMA */
-	if (result == PCI_ERS_RESULT_CAN_RECOVER) {
-		rc = eeh_pci_enable(pe, EEH_OPT_THAW_DMA);
-
-		if (rc < 0)
-			goto hard_fail;
-		if (rc)
-			result = PCI_ERS_RESULT_NEED_RESET;
-		else
-			result = PCI_ERS_RESULT_RECOVERED;
-	}
-
-	/* If any device has a hard failure, then shut off everything. */
-	if (result == PCI_ERS_RESULT_DISCONNECT) {
-		printk(KERN_WARNING "EEH: Device driver gave up\n");
-		goto hard_fail;
-	}
-
-	/* If any device called out for a reset, then reset the slot */
-	if (result == PCI_ERS_RESULT_NEED_RESET) {
-		rc = eeh_reset_device(pe, NULL);
-		if (rc) {
-			printk(KERN_WARNING "EEH: Cannot reset, rc=%d\n", rc);
-			goto hard_fail;
-		}
-		result = PCI_ERS_RESULT_NONE;
-		eeh_pe_dev_traverse(pe, eeh_report_reset, &result);
-	}
-
-	/* All devices should claim they have recovered by now. */
-	if ((result != PCI_ERS_RESULT_RECOVERED) &&
-	    (result != PCI_ERS_RESULT_NONE)) {
-		printk(KERN_WARNING "EEH: Not recovered\n");
-		goto hard_fail;
-	}
-
-	/* Tell all device drivers that they can resume operations */
-	eeh_pe_dev_traverse(pe, eeh_report_resume, NULL);
-
-	return;
-	
-excess_failures:
-	/*
-	 * About 90% of all real-life EEH failures in the field
-	 * are due to poorly seated PCI cards. Only 10% or so are
-	 * due to actual, failed cards.
-	 */
-	pr_err("EEH: PHB#%d-PE#%x has failed %d times in the\n"
-	       "last hour and has been permanently disabled.\n"
-	       "Please try reseating or replacing it.\n",
-		pe->phb->global_number, pe->addr,
-		pe->freeze_count);
-	goto perm_error;
-
-hard_fail:
-	pr_err("EEH: Unable to recover from failure from PHB#%d-PE#%x.\n"
-	       "Please try reseating or replacing it\n",
-		pe->phb->global_number, pe->addr);
-
-perm_error:
-	eeh_slot_error_detail(pe, EEH_LOG_PERM);
-
-	/* Notify all devices that they're about to go down. */
-	eeh_pe_dev_traverse(pe, eeh_report_failure, NULL);
-
-	/* Shut down the device drivers for good. */
-	if (frozen_bus)
-		pcibios_remove_pci_devices(frozen_bus);
-}
-
diff --git a/arch/powerpc/platforms/pseries/eeh_event.c b/arch/powerpc/platforms/pseries/eeh_event.c
deleted file mode 100644
index 185bedd..0000000
--- a/arch/powerpc/platforms/pseries/eeh_event.c
+++ /dev/null
@@ -1,142 +0,0 @@
-/*
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License as published by
- * the Free Software Foundation; either version 2 of the License, or
- * (at your option) any later version.
- *
- * This program is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
- * GNU General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with this program; if not, write to the Free Software
- * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307 USA
- *
- * Copyright (c) 2005 Linas Vepstas <linas@linas.org>
- */
-
-#include <linux/delay.h>
-#include <linux/list.h>
-#include <linux/mutex.h>
-#include <linux/sched.h>
-#include <linux/pci.h>
-#include <linux/slab.h>
-#include <linux/workqueue.h>
-#include <linux/kthread.h>
-#include <asm/eeh_event.h>
-#include <asm/ppc-pci.h>
-
-/** Overview:
- *  EEH error states may be detected within exception handlers;
- *  however, the recovery processing needs to occur asynchronously
- *  in a normal kernel context and not an interrupt context.
- *  This pair of routines creates an event and queues it onto a
- *  work-queue, where a worker thread can drive recovery.
- */
-
-/* EEH event workqueue setup. */
-static DEFINE_SPINLOCK(eeh_eventlist_lock);
-LIST_HEAD(eeh_eventlist);
-static void eeh_thread_launcher(struct work_struct *);
-DECLARE_WORK(eeh_event_wq, eeh_thread_launcher);
-
-/* Serialize reset sequences for a given pci device */
-DEFINE_MUTEX(eeh_event_mutex);
-
-/**
- * eeh_event_handler - Dispatch EEH events.
- * @dummy - unused
- *
- * The detection of a frozen slot can occur inside an interrupt,
- * where it can be hard to do anything about it.  The goal of this
- * routine is to pull these detection events out of the context
- * of the interrupt handler, and re-dispatch them for processing
- * at a later time in a normal context.
- */
-static int eeh_event_handler(void * dummy)
-{
-	unsigned long flags;
-	struct eeh_event *event;
-	struct eeh_pe *pe;
-
-	spin_lock_irqsave(&eeh_eventlist_lock, flags);
-	event = NULL;
-
-	/* Unqueue the event, get ready to process. */
-	if (!list_empty(&eeh_eventlist)) {
-		event = list_entry(eeh_eventlist.next, struct eeh_event, list);
-		list_del(&event->list);
-	}
-	spin_unlock_irqrestore(&eeh_eventlist_lock, flags);
-
-	if (event == NULL)
-		return 0;
-
-	/* Serialize processing of EEH events */
-	mutex_lock(&eeh_event_mutex);
-	pe = event->pe;
-	eeh_pe_state_mark(pe, EEH_PE_RECOVERING);
-	pr_info("EEH: Detected PCI bus error on PHB#%d-PE#%x\n",
-		pe->phb->global_number, pe->addr);
-
-	set_current_state(TASK_INTERRUPTIBLE);	/* Don't add to load average */
-	eeh_handle_event(pe);
-	eeh_pe_state_clear(pe, EEH_PE_RECOVERING);
-
-	kfree(event);
-	mutex_unlock(&eeh_event_mutex);
-
-	/* If there are no new errors after an hour, clear the counter. */
-	if (pe && pe->freeze_count > 0) {
-		msleep_interruptible(3600*1000);
-		if (pe->freeze_count > 0)
-			pe->freeze_count--;
-
-	}
-
-	return 0;
-}
-
-/**
- * eeh_thread_launcher - Start kernel thread to handle EEH events
- * @dummy - unused
- *
- * This routine is called to start the kernel thread for processing
- * EEH event.
- */
-static void eeh_thread_launcher(struct work_struct *dummy)
-{
-	if (IS_ERR(kthread_run(eeh_event_handler, NULL, "eehd")))
-		printk(KERN_ERR "Failed to start EEH daemon\n");
-}
-
-/**
- * eeh_send_failure_event - Generate a PCI error event
- * @pe: EEH PE
- *
- * This routine can be called within an interrupt context;
- * the actual event will be delivered in a normal context
- * (from a workqueue).
- */
-int eeh_send_failure_event(struct eeh_pe *pe)
-{
-	unsigned long flags;
-	struct eeh_event *event;
-
-	event = kzalloc(sizeof(*event), GFP_ATOMIC);
-	if (!event) {
-		pr_err("EEH: out of memory, event not handled\n");
-		return -ENOMEM;
-	}
-	event->pe = pe;
-
-	/* We may or may not be called in an interrupt context */
-	spin_lock_irqsave(&eeh_eventlist_lock, flags);
-	list_add(&event->list, &eeh_eventlist);
-	spin_unlock_irqrestore(&eeh_eventlist_lock, flags);
-
-	schedule_work(&eeh_event_wq);
-
-	return 0;
-}
diff --git a/arch/powerpc/platforms/pseries/eeh_pe.c b/arch/powerpc/platforms/pseries/eeh_pe.c
deleted file mode 100644
index 9d4a9e8..0000000
--- a/arch/powerpc/platforms/pseries/eeh_pe.c
+++ /dev/null
@@ -1,653 +0,0 @@
-/*
- * The file intends to implement PE based on the information from
- * platforms. Basically, there have 3 types of PEs: PHB/Bus/Device.
- * All the PEs should be organized as hierarchy tree. The first level
- * of the tree will be associated to existing PHBs since the particular
- * PE is only meaningful in one PHB domain.
- *
- * Copyright Benjamin Herrenschmidt & Gavin Shan, IBM Corporation 2012.
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License as published by
- * the Free Software Foundation; either version 2 of the License, or
- * (at your option) any later version.
- *
- * This program is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
- * GNU General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with this program; if not, write to the Free Software
- * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307 USA
- */
-
-#include <linux/export.h>
-#include <linux/gfp.h>
-#include <linux/init.h>
-#include <linux/kernel.h>
-#include <linux/pci.h>
-#include <linux/string.h>
-
-#include <asm/pci-bridge.h>
-#include <asm/ppc-pci.h>
-
-static LIST_HEAD(eeh_phb_pe);
-
-/**
- * eeh_pe_alloc - Allocate PE
- * @phb: PCI controller
- * @type: PE type
- *
- * Allocate PE instance dynamically.
- */
-static struct eeh_pe *eeh_pe_alloc(struct pci_controller *phb, int type)
-{
-	struct eeh_pe *pe;
-
-	/* Allocate PHB PE */
-	pe = kzalloc(sizeof(struct eeh_pe), GFP_KERNEL);
-	if (!pe) return NULL;
-
-	/* Initialize PHB PE */
-	pe->type = type;
-	pe->phb = phb;
-	INIT_LIST_HEAD(&pe->child_list);
-	INIT_LIST_HEAD(&pe->child);
-	INIT_LIST_HEAD(&pe->edevs);
-
-	return pe;
-}
-
-/**
- * eeh_phb_pe_create - Create PHB PE
- * @phb: PCI controller
- *
- * The function should be called while the PHB is detected during
- * system boot or PCI hotplug in order to create PHB PE.
- */
-int eeh_phb_pe_create(struct pci_controller *phb)
-{
-	struct eeh_pe *pe;
-
-	/* Allocate PHB PE */
-	pe = eeh_pe_alloc(phb, EEH_PE_PHB);
-	if (!pe) {
-		pr_err("%s: out of memory!\n", __func__);
-		return -ENOMEM;
-	}
-
-	/* Put it into the list */
-	eeh_lock();
-	list_add_tail(&pe->child, &eeh_phb_pe);
-	eeh_unlock();
-
-	pr_debug("EEH: Add PE for PHB#%d\n", phb->global_number);
-
-	return 0;
-}
-
-/**
- * eeh_phb_pe_get - Retrieve PHB PE based on the given PHB
- * @phb: PCI controller
- *
- * The overall PEs form hierarchy tree. The first layer of the
- * hierarchy tree is composed of PHB PEs. The function is used
- * to retrieve the corresponding PHB PE according to the given PHB.
- */
-static struct eeh_pe *eeh_phb_pe_get(struct pci_controller *phb)
-{
-	struct eeh_pe *pe;
-
-	list_for_each_entry(pe, &eeh_phb_pe, child) {
-		/*
-		 * Actually, we needn't check the type since
-		 * the PE for PHB has been determined when that
-		 * was created.
-		 */
-		if ((pe->type & EEH_PE_PHB) && pe->phb == phb)
-			return pe;
-	}
-
-	return NULL;
-}
-
-/**
- * eeh_pe_next - Retrieve the next PE in the tree
- * @pe: current PE
- * @root: root PE
- *
- * The function is used to retrieve the next PE in the
- * hierarchy PE tree.
- */
-static struct eeh_pe *eeh_pe_next(struct eeh_pe *pe,
-				  struct eeh_pe *root)
-{
-	struct list_head *next = pe->child_list.next;
-
-	if (next == &pe->child_list) {
-		while (1) {
-			if (pe == root)
-				return NULL;
-			next = pe->child.next;
-			if (next != &pe->parent->child_list)
-				break;
-			pe = pe->parent;
-		}
-	}
-
-	return list_entry(next, struct eeh_pe, child);
-}
-
-/**
- * eeh_pe_traverse - Traverse PEs in the specified PHB
- * @root: root PE
- * @fn: callback
- * @flag: extra parameter to callback
- *
- * The function is used to traverse the specified PE and its
- * child PEs. The traversing is to be terminated once the
- * callback returns something other than NULL, or no more PEs
- * to be traversed.
- */
-static void *eeh_pe_traverse(struct eeh_pe *root,
-			eeh_traverse_func fn, void *flag)
-{
-	struct eeh_pe *pe;
-	void *ret;
-
-	for (pe = root; pe; pe = eeh_pe_next(pe, root)) {
-		ret = fn(pe, flag);
-		if (ret) return ret;
-	}
-
-	return NULL;
-}
-
-/**
- * eeh_pe_dev_traverse - Traverse the devices from the PE
- * @root: EEH PE
- * @fn: function callback
- * @flag: extra parameter to callback
- *
- * The function is used to traverse the devices of the specified
- * PE and its child PEs.
- */
-void *eeh_pe_dev_traverse(struct eeh_pe *root,
-		eeh_traverse_func fn, void *flag)
-{
-	struct eeh_pe *pe;
-	struct eeh_dev *edev;
-	void *ret;
-
-	if (!root) {
-		pr_warning("%s: Invalid PE %p\n", __func__, root);
-		return NULL;
-	}
-
-	eeh_lock();
-
-	/* Traverse root PE */
-	for (pe = root; pe; pe = eeh_pe_next(pe, root)) {
-		eeh_pe_for_each_dev(pe, edev) {
-			ret = fn(edev, flag);
-			if (ret) {
-				eeh_unlock();
-				return ret;
-			}
-		}
-	}
-
-	eeh_unlock();
-
-	return NULL;
-}
-
-/**
- * __eeh_pe_get - Check the PE address
- * @data: EEH PE
- * @flag: EEH device
- *
- * For one particular PE, it can be identified by PE address
- * or tranditional BDF address. BDF address is composed of
- * Bus/Device/Function number. The extra data referred by flag
- * indicates which type of address should be used.
- */
-static void *__eeh_pe_get(void *data, void *flag)
-{
-	struct eeh_pe *pe = (struct eeh_pe *)data;
-	struct eeh_dev *edev = (struct eeh_dev *)flag;
-
-	/* Unexpected PHB PE */
-	if (pe->type & EEH_PE_PHB)
-		return NULL;
-
-	/* We prefer PE address */
-	if (edev->pe_config_addr &&
-	   (edev->pe_config_addr == pe->addr))
-		return pe;
-
-	/* Try BDF address */
-	if (edev->pe_config_addr &&
-	   (edev->config_addr == pe->config_addr))
-		return pe;
-
-	return NULL;
-}
-
-/**
- * eeh_pe_get - Search PE based on the given address
- * @edev: EEH device
- *
- * Search the corresponding PE based on the specified address which
- * is included in the eeh device. The function is used to check if
- * the associated PE has been created against the PE address. It's
- * notable that the PE address has 2 format: traditional PE address
- * which is composed of PCI bus/device/function number, or unified
- * PE address.
- */
-static struct eeh_pe *eeh_pe_get(struct eeh_dev *edev)
-{
-	struct eeh_pe *root = eeh_phb_pe_get(edev->phb);
-	struct eeh_pe *pe;
-
-	pe = eeh_pe_traverse(root, __eeh_pe_get, edev);
-
-	return pe;
-}
-
-/**
- * eeh_pe_get_parent - Retrieve the parent PE
- * @edev: EEH device
- *
- * The whole PEs existing in the system are organized as hierarchy
- * tree. The function is used to retrieve the parent PE according
- * to the parent EEH device.
- */
-static struct eeh_pe *eeh_pe_get_parent(struct eeh_dev *edev)
-{
-	struct device_node *dn;
-	struct eeh_dev *parent;
-
-	/*
-	 * It might have the case for the indirect parent
-	 * EEH device already having associated PE, but
-	 * the direct parent EEH device doesn't have yet.
-	 */
-	dn = edev->dn->parent;
-	while (dn) {
-		/* We're poking out of PCI territory */
-		if (!PCI_DN(dn)) return NULL;
-
-		parent = of_node_to_eeh_dev(dn);
-		/* We're poking out of PCI territory */
-		if (!parent) return NULL;
-
-		if (parent->pe)
-			return parent->pe;
-
-		dn = dn->parent;
-	}
-
-	return NULL;
-}
-
-/**
- * eeh_add_to_parent_pe - Add EEH device to parent PE
- * @edev: EEH device
- *
- * Add EEH device to the parent PE. If the parent PE already
- * exists, the PE type will be changed to EEH_PE_BUS. Otherwise,
- * we have to create new PE to hold the EEH device and the new
- * PE will be linked to its parent PE as well.
- */
-int eeh_add_to_parent_pe(struct eeh_dev *edev)
-{
-	struct eeh_pe *pe, *parent;
-
-	eeh_lock();
-
-	/*
-	 * Search the PE has been existing or not according
-	 * to the PE address. If that has been existing, the
-	 * PE should be composed of PCI bus and its subordinate
-	 * components.
-	 */
-	pe = eeh_pe_get(edev);
-	if (pe && !(pe->type & EEH_PE_INVALID)) {
-		if (!edev->pe_config_addr) {
-			eeh_unlock();
-			pr_err("%s: PE with addr 0x%x already exists\n",
-				__func__, edev->config_addr);
-			return -EEXIST;
-		}
-
-		/* Mark the PE as type of PCI bus */
-		pe->type = EEH_PE_BUS;
-		edev->pe = pe;
-
-		/* Put the edev to PE */
-		list_add_tail(&edev->list, &pe->edevs);
-		eeh_unlock();
-		pr_debug("EEH: Add %s to Bus PE#%x\n",
-			edev->dn->full_name, pe->addr);
-
-		return 0;
-	} else if (pe && (pe->type & EEH_PE_INVALID)) {
-		list_add_tail(&edev->list, &pe->edevs);
-		edev->pe = pe;
-		/*
-		 * We're running to here because of PCI hotplug caused by
-		 * EEH recovery. We need clear EEH_PE_INVALID until the top.
-		 */
-		parent = pe;
-		while (parent) {
-			if (!(parent->type & EEH_PE_INVALID))
-				break;
-			parent->type &= ~EEH_PE_INVALID;
-			parent = parent->parent;
-		}
-		eeh_unlock();
-		pr_debug("EEH: Add %s to Device PE#%x, Parent PE#%x\n",
-			edev->dn->full_name, pe->addr, pe->parent->addr);
-
-		return 0;
-	}
-
-	/* Create a new EEH PE */
-	pe = eeh_pe_alloc(edev->phb, EEH_PE_DEVICE);
-	if (!pe) {
-		eeh_unlock();
-		pr_err("%s: out of memory!\n", __func__);
-		return -ENOMEM;
-	}
-	pe->addr	= edev->pe_config_addr;
-	pe->config_addr	= edev->config_addr;
-
-	/*
-	 * Put the new EEH PE into hierarchy tree. If the parent
-	 * can't be found, the newly created PE will be attached
-	 * to PHB directly. Otherwise, we have to associate the
-	 * PE with its parent.
-	 */
-	parent = eeh_pe_get_parent(edev);
-	if (!parent) {
-		parent = eeh_phb_pe_get(edev->phb);
-		if (!parent) {
-			eeh_unlock();
-			pr_err("%s: No PHB PE is found (PHB Domain=%d)\n",
-				__func__, edev->phb->global_number);
-			edev->pe = NULL;
-			kfree(pe);
-			return -EEXIST;
-		}
-	}
-	pe->parent = parent;
-
-	/*
-	 * Put the newly created PE into the child list and
-	 * link the EEH device accordingly.
-	 */
-	list_add_tail(&pe->child, &parent->child_list);
-	list_add_tail(&edev->list, &pe->edevs);
-	edev->pe = pe;
-	eeh_unlock();
-	pr_debug("EEH: Add %s to Device PE#%x, Parent PE#%x\n",
-		edev->dn->full_name, pe->addr, pe->parent->addr);
-
-	return 0;
-}
-
-/**
- * eeh_rmv_from_parent_pe - Remove one EEH device from the associated PE
- * @edev: EEH device
- * @purge_pe: remove PE or not
- *
- * The PE hierarchy tree might be changed when doing PCI hotplug.
- * Also, the PCI devices or buses could be removed from the system
- * during EEH recovery. So we have to call the function remove the
- * corresponding PE accordingly if necessary.
- */
-int eeh_rmv_from_parent_pe(struct eeh_dev *edev, int purge_pe)
-{
-	struct eeh_pe *pe, *parent, *child;
-	int cnt;
-
-	if (!edev->pe) {
-		pr_warning("%s: No PE found for EEH device %s\n",
-			__func__, edev->dn->full_name);
-		return -EEXIST;
-	}
-
-	eeh_lock();
-
-	/* Remove the EEH device */
-	pe = edev->pe;
-	edev->pe = NULL;
-	list_del(&edev->list);
-
-	/*
-	 * Check if the parent PE includes any EEH devices.
-	 * If not, we should delete that. Also, we should
-	 * delete the parent PE if it doesn't have associated
-	 * child PEs and EEH devices.
-	 */
-	while (1) {
-		parent = pe->parent;
-		if (pe->type & EEH_PE_PHB)
-			break;
-
-		if (purge_pe) {
-			if (list_empty(&pe->edevs) &&
-			    list_empty(&pe->child_list)) {
-				list_del(&pe->child);
-				kfree(pe);
-			} else {
-				break;
-			}
-		} else {
-			if (list_empty(&pe->edevs)) {
-				cnt = 0;
-				list_for_each_entry(child, &pe->child_list, child) {
-					if (!(child->type & EEH_PE_INVALID)) {
-						cnt++;
-						break;
-					}
-				}
-
-				if (!cnt)
-					pe->type |= EEH_PE_INVALID;
-				else
-					break;
-			}
-		}
-
-		pe = parent;
-	}
-
-	eeh_unlock();
-
-	return 0;
-}
-
-/**
- * __eeh_pe_state_mark - Mark the state for the PE
- * @data: EEH PE
- * @flag: state
- *
- * The function is used to mark the indicated state for the given
- * PE. Also, the associated PCI devices will be put into IO frozen
- * state as well.
- */
-static void *__eeh_pe_state_mark(void *data, void *flag)
-{
-	struct eeh_pe *pe = (struct eeh_pe *)data;
-	int state = *((int *)flag);
-	struct eeh_dev *tmp;
-	struct pci_dev *pdev;
-
-	/*
-	 * Mark the PE with the indicated state. Also,
-	 * the associated PCI device will be put into
-	 * I/O frozen state to avoid I/O accesses from
-	 * the PCI device driver.
-	 */
-	pe->state |= state;
-	eeh_pe_for_each_dev(pe, tmp) {
-		pdev = eeh_dev_to_pci_dev(tmp);
-		if (pdev)
-			pdev->error_state = pci_channel_io_frozen;
-	}
-
-	return NULL;
-}
-
-/**
- * eeh_pe_state_mark - Mark specified state for PE and its associated device
- * @pe: EEH PE
- *
- * EEH error affects the current PE and its child PEs. The function
- * is used to mark appropriate state for the affected PEs and the
- * associated devices.
- */
-void eeh_pe_state_mark(struct eeh_pe *pe, int state)
-{
-	eeh_lock();
-	eeh_pe_traverse(pe, __eeh_pe_state_mark, &state);
-	eeh_unlock();
-}
-
-/**
- * __eeh_pe_state_clear - Clear state for the PE
- * @data: EEH PE
- * @flag: state
- *
- * The function is used to clear the indicated state from the
- * given PE. Besides, we also clear the check count of the PE
- * as well.
- */
-static void *__eeh_pe_state_clear(void *data, void *flag)
-{
-	struct eeh_pe *pe = (struct eeh_pe *)data;
-	int state = *((int *)flag);
-
-	pe->state &= ~state;
-	pe->check_count = 0;
-
-	return NULL;
-}
-
-/**
- * eeh_pe_state_clear - Clear state for the PE and its children
- * @pe: PE
- * @state: state to be cleared
- *
- * When the PE and its children has been recovered from error,
- * we need clear the error state for that. The function is used
- * for the purpose.
- */
-void eeh_pe_state_clear(struct eeh_pe *pe, int state)
-{
-	eeh_lock();
-	eeh_pe_traverse(pe, __eeh_pe_state_clear, &state);
-	eeh_unlock();
-}
-
-/**
- * eeh_restore_one_device_bars - Restore the Base Address Registers for one device
- * @data: EEH device
- * @flag: Unused
- *
- * Loads the PCI configuration space base address registers,
- * the expansion ROM base address, the latency timer, and etc.
- * from the saved values in the device node.
- */
-static void *eeh_restore_one_device_bars(void *data, void *flag)
-{
-	int i;
-	u32 cmd;
-	struct eeh_dev *edev = (struct eeh_dev *)data;
-	struct device_node *dn = eeh_dev_to_of_node(edev);
-
-	for (i = 4; i < 10; i++)
-		eeh_ops->write_config(dn, i*4, 4, edev->config_space[i]);
-	/* 12 == Expansion ROM Address */
-	eeh_ops->write_config(dn, 12*4, 4, edev->config_space[12]);
-
-#define BYTE_SWAP(OFF) (8*((OFF)/4)+3-(OFF))
-#define SAVED_BYTE(OFF) (((u8 *)(edev->config_space))[BYTE_SWAP(OFF)])
-
-	eeh_ops->write_config(dn, PCI_CACHE_LINE_SIZE, 1,
-		SAVED_BYTE(PCI_CACHE_LINE_SIZE));
-	eeh_ops->write_config(dn, PCI_LATENCY_TIMER, 1,
-		SAVED_BYTE(PCI_LATENCY_TIMER));
-
-	/* max latency, min grant, interrupt pin and line */
-	eeh_ops->write_config(dn, 15*4, 4, edev->config_space[15]);
-
-	/*
-	 * Restore PERR & SERR bits, some devices require it,
-	 * don't touch the other command bits
-	 */
-	eeh_ops->read_config(dn, PCI_COMMAND, 4, &cmd);
-	if (edev->config_space[1] & PCI_COMMAND_PARITY)
-		cmd |= PCI_COMMAND_PARITY;
-	else
-		cmd &= ~PCI_COMMAND_PARITY;
-	if (edev->config_space[1] & PCI_COMMAND_SERR)
-		cmd |= PCI_COMMAND_SERR;
-	else
-		cmd &= ~PCI_COMMAND_SERR;
-	eeh_ops->write_config(dn, PCI_COMMAND, 4, cmd);
-
-	return NULL;
-}
-
-/**
- * eeh_pe_restore_bars - Restore the PCI config space info
- * @pe: EEH PE
- *
- * This routine performs a recursive walk to the children
- * of this device as well.
- */
-void eeh_pe_restore_bars(struct eeh_pe *pe)
-{
-	/*
-	 * We needn't take the EEH lock since eeh_pe_dev_traverse()
-	 * will take that.
-	 */
-	eeh_pe_dev_traverse(pe, eeh_restore_one_device_bars, NULL);
-}
-
-/**
- * eeh_pe_bus_get - Retrieve PCI bus according to the given PE
- * @pe: EEH PE
- *
- * Retrieve the PCI bus according to the given PE. Basically,
- * there're 3 types of PEs: PHB/Bus/Device. For PHB PE, the
- * primary PCI bus will be retrieved. The parent bus will be
- * returned for BUS PE. However, we don't have associated PCI
- * bus for DEVICE PE.
- */
-struct pci_bus *eeh_pe_bus_get(struct eeh_pe *pe)
-{
-	struct pci_bus *bus = NULL;
-	struct eeh_dev *edev;
-	struct pci_dev *pdev;
-
-	eeh_lock();
-
-	if (pe->type & EEH_PE_PHB) {
-		bus = pe->phb->bus;
-	} else if (pe->type & EEH_PE_BUS ||
-		   pe->type & EEH_PE_DEVICE) {
-		edev = list_first_entry(&pe->edevs, struct eeh_dev, list);
-		pdev = eeh_dev_to_pci_dev(edev);
-		if (pdev)
-			bus = pdev->bus;
-	}
-
-	eeh_unlock();
-
-	return bus;
-}
diff --git a/arch/powerpc/platforms/pseries/eeh_sysfs.c b/arch/powerpc/platforms/pseries/eeh_sysfs.c
deleted file mode 100644
index d377083..0000000
--- a/arch/powerpc/platforms/pseries/eeh_sysfs.c
+++ /dev/null
@@ -1,75 +0,0 @@
-/*
- * Sysfs entries for PCI Error Recovery for PAPR-compliant platform.
- * Copyright IBM Corporation 2007
- * Copyright Linas Vepstas <linas@austin.ibm.com> 2007
- *
- * All rights reserved.
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License as published by
- * the Free Software Foundation; either version 2 of the License, or (at
- * your option) any later version.
- *
- * This program is distributed in the hope that it will be useful, but
- * WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE, GOOD TITLE or
- * NON INFRINGEMENT.  See the GNU General Public License for more
- * details.
- *
- * You should have received a copy of the GNU General Public License
- * along with this program; if not, write to the Free Software
- * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
- *
- * Send comments and feedback to Linas Vepstas <linas@austin.ibm.com>
- */
-#include <linux/pci.h>
-#include <linux/stat.h>
-#include <asm/ppc-pci.h>
-#include <asm/pci-bridge.h>
-
-/**
- * EEH_SHOW_ATTR -- Create sysfs entry for eeh statistic
- * @_name: name of file in sysfs directory
- * @_memb: name of member in struct pci_dn to access
- * @_format: printf format for display
- *
- * All of the attributes look very similar, so just
- * auto-gen a cut-n-paste routine to display them.
- */
-#define EEH_SHOW_ATTR(_name,_memb,_format)               \
-static ssize_t eeh_show_##_name(struct device *dev,      \
-		struct device_attribute *attr, char *buf)          \
-{                                                        \
-	struct pci_dev *pdev = to_pci_dev(dev);               \
-	struct eeh_dev *edev = pci_dev_to_eeh_dev(pdev);      \
-	                                                      \
-	if (!edev)                                            \
-		return 0;                                     \
-	                                                      \
-	return sprintf(buf, _format "\n", edev->_memb);       \
-}                                                        \
-static DEVICE_ATTR(_name, S_IRUGO, eeh_show_##_name, NULL);
-
-EEH_SHOW_ATTR(eeh_mode,            mode,            "0x%x");
-EEH_SHOW_ATTR(eeh_config_addr,     config_addr,     "0x%x");
-EEH_SHOW_ATTR(eeh_pe_config_addr,  pe_config_addr,  "0x%x");
-
-void eeh_sysfs_add_device(struct pci_dev *pdev)
-{
-	int rc=0;
-
-	rc += device_create_file(&pdev->dev, &dev_attr_eeh_mode);
-	rc += device_create_file(&pdev->dev, &dev_attr_eeh_config_addr);
-	rc += device_create_file(&pdev->dev, &dev_attr_eeh_pe_config_addr);
-
-	if (rc)
-		printk(KERN_WARNING "EEH: Unable to create sysfs entries\n");
-}
-
-void eeh_sysfs_remove_device(struct pci_dev *pdev)
-{
-	device_remove_file(&pdev->dev, &dev_attr_eeh_mode);
-	device_remove_file(&pdev->dev, &dev_attr_eeh_config_addr);
-	device_remove_file(&pdev->dev, &dev_attr_eeh_pe_config_addr);
-}
-
diff --git a/arch/powerpc/platforms/pseries/pci_dlpar.c b/arch/powerpc/platforms/pseries/pci_dlpar.c
index c91b22b..efe6137 100644
--- a/arch/powerpc/platforms/pseries/pci_dlpar.c
+++ b/arch/powerpc/platforms/pseries/pci_dlpar.c
@@ -64,91 +64,6 @@ pcibios_find_pci_bus(struct device_node *dn)
 }
 EXPORT_SYMBOL_GPL(pcibios_find_pci_bus);
 
-/**
- * __pcibios_remove_pci_devices - remove all devices under this bus
- * @bus: the indicated PCI bus
- * @purge_pe: destroy the PE on removal of PCI devices
- *
- * Remove all of the PCI devices under this bus both from the
- * linux pci device tree, and from the powerpc EEH address cache.
- * By default, the corresponding PE will be destroied during the
- * normal PCI hotplug path. For PCI hotplug during EEH recovery,
- * the corresponding PE won't be destroied and deallocated.
- */
-void __pcibios_remove_pci_devices(struct pci_bus *bus, int purge_pe)
-{
-	struct pci_dev *dev, *tmp;
-	struct pci_bus *child_bus;
-
-	/* First go down child busses */
-	list_for_each_entry(child_bus, &bus->children, node)
-		__pcibios_remove_pci_devices(child_bus, purge_pe);
-
-	pr_debug("PCI: Removing devices on bus %04x:%02x\n",
-		pci_domain_nr(bus),  bus->number);
-	list_for_each_entry_safe(dev, tmp, &bus->devices, bus_list) {
-		pr_debug("     * Removing %s...\n", pci_name(dev));
-		eeh_remove_bus_device(dev, purge_pe);
-		pci_stop_and_remove_bus_device(dev);
-	}
-}
-
-/**
- * pcibios_remove_pci_devices - remove all devices under this bus
- *
- * Remove all of the PCI devices under this bus both from the
- * linux pci device tree, and from the powerpc EEH address cache.
- */
-void pcibios_remove_pci_devices(struct pci_bus *bus)
-{
-	__pcibios_remove_pci_devices(bus, 1);
-}
-EXPORT_SYMBOL_GPL(pcibios_remove_pci_devices);
-
-/**
- * pcibios_add_pci_devices - adds new pci devices to bus
- *
- * This routine will find and fixup new pci devices under
- * the indicated bus. This routine presumes that there
- * might already be some devices under this bridge, so
- * it carefully tries to add only new devices.  (And that
- * is how this routine differs from other, similar pcibios
- * routines.)
- */
-void pcibios_add_pci_devices(struct pci_bus * bus)
-{
-	int slotno, num, mode, pass, max;
-	struct pci_dev *dev;
-	struct device_node *dn = pci_bus_to_OF_node(bus);
-
-	eeh_add_device_tree_early(dn);
-
-	mode = PCI_PROBE_NORMAL;
-	if (ppc_md.pci_probe_mode)
-		mode = ppc_md.pci_probe_mode(bus);
-
-	if (mode == PCI_PROBE_DEVTREE) {
-		/* use ofdt-based probe */
-		of_rescan_bus(dn, bus);
-	} else if (mode == PCI_PROBE_NORMAL) {
-		/* use legacy probe */
-		slotno = PCI_SLOT(PCI_DN(dn->child)->devfn);
-		num = pci_scan_slot(bus, PCI_DEVFN(slotno, 0));
-		if (!num)
-			return;
-		pcibios_setup_bus_devices(bus);
-		max = bus->busn_res.start;
-		for (pass=0; pass < 2; pass++)
-			list_for_each_entry(dev, &bus->devices, bus_list) {
-			if (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE ||
-			    dev->hdr_type == PCI_HEADER_TYPE_CARDBUS)
-				max = pci_scan_bridge(bus, dev, max, pass);
-		}
-	}
-	pcibios_finish_adding_to_bus(bus);
-}
-EXPORT_SYMBOL_GPL(pcibios_add_pci_devices);
-
 struct pci_controller *init_phb_dynamic(struct device_node *dn)
 {
 	struct pci_controller *phb;
-- 
1.7.5.4

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 02/27] powerpc/eeh: Cleanup for EEH core
  2013-06-15  9:02 [PATCH v4 00/27] EEH Support for PowerNV platform Gavin Shan
  2013-06-15  9:02 ` [PATCH 01/27] powerpc/eeh: Move common part to kernel directory Gavin Shan
@ 2013-06-15  9:02 ` Gavin Shan
  2013-06-15  9:02 ` [PATCH 03/27] powerpc/eeh: Make eeh_phb_pe_get() public Gavin Shan
                   ` (24 subsequent siblings)
  26 siblings, 0 replies; 34+ messages in thread
From: Gavin Shan @ 2013-06-15  9:02 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Gavin Shan

While moving EEH core around from pSeries platform directory to
arch/powerpc/kernel (in previous one patch), there has lots of
complaints for coding style from "git show". The patch is going
to fix them.

Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
---
 arch/powerpc/kernel/eeh.c        |   22 +++++++++++-----------
 arch/powerpc/kernel/eeh_driver.c |   14 +++++++-------
 2 files changed, 18 insertions(+), 18 deletions(-)

diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
index 6b73d6c..8a83451 100644
--- a/arch/powerpc/kernel/eeh.c
+++ b/arch/powerpc/kernel/eeh.c
@@ -368,7 +368,7 @@ int eeh_dev_check_failure(struct eeh_dev *edev)
 	}
 
 	eeh_stats.slot_resets++;
- 
+
 	/* Avoid repeated reports of this failure, including problems
 	 * with other functions on this device, and functions under
 	 * bridges.
@@ -525,7 +525,7 @@ static void eeh_reset_pe_once(struct eeh_pe *pe)
 	 * or a fundamental reset (3).
 	 * A fundamental reset required by any device under
 	 * Partitionable Endpoint trumps hot-reset.
-  	 */
+	 */
 	eeh_pe_dev_traverse(pe, eeh_set_dev_freset, &freset);
 
 	if (freset)
@@ -538,8 +538,8 @@ static void eeh_reset_pe_once(struct eeh_pe *pe)
 	 */
 #define PCI_BUS_RST_HOLD_TIME_MSEC 250
 	msleep(PCI_BUS_RST_HOLD_TIME_MSEC);
-	
-	/* We might get hit with another EEH freeze as soon as the 
+
+	/* We might get hit with another EEH freeze as soon as the
 	 * pci slot reset line is dropped. Make sure we don't miss
 	 * these, and clear the flag now.
 	 */
@@ -604,7 +604,7 @@ void eeh_save_bars(struct eeh_dev *edev)
 	if (!edev)
 		return;
 	dn = eeh_dev_to_of_node(edev);
-	
+
 	for (i = 0; i < 16; i++)
 		eeh_ops->read_config(dn, i * 4, 4, &edev->config_space[i]);
 }
@@ -803,12 +803,12 @@ void eeh_add_device_tree_late(struct pci_bus *bus)
 	struct pci_dev *dev;
 
 	list_for_each_entry(dev, &bus->devices, bus_list) {
- 		eeh_add_device_late(dev);
- 		if (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE) {
- 			struct pci_bus *subbus = dev->subordinate;
- 			if (subbus)
- 				eeh_add_device_tree_late(subbus);
- 		}
+		eeh_add_device_late(dev);
+		if (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE) {
+			struct pci_bus *subbus = dev->subordinate;
+			if (subbus)
+				eeh_add_device_tree_late(subbus);
+		}
 	}
 }
 EXPORT_SYMBOL_GPL(eeh_add_device_tree_late);
diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c
index a3fefb6..0acc5a2 100644
--- a/arch/powerpc/kernel/eeh_driver.c
+++ b/arch/powerpc/kernel/eeh_driver.c
@@ -154,9 +154,9 @@ static void eeh_enable_irq(struct pci_dev *dev)
  * eeh_report_error - Report pci error to each device driver
  * @data: eeh device
  * @userdata: return value
- * 
- * Report an EEH error to each device driver, collect up and 
- * merge the device driver responses. Cumulative response 
+ *
+ * Report an EEH error to each device driver, collect up and
+ * merge the device driver responses. Cumulative response
  * passed back in "userdata".
  */
 static void *eeh_report_error(void *data, void *userdata)
@@ -376,9 +376,9 @@ static int eeh_reset_device(struct eeh_pe *pe, struct pci_bus *bus)
 	eeh_pe_restore_bars(pe);
 
 	/* Give the system 5 seconds to finish running the user-space
-	 * hotplug shutdown scripts, e.g. ifdown for ethernet.  Yes, 
-	 * this is a hack, but if we don't do this, and try to bring 
-	 * the device up before the scripts have taken it down, 
+	 * hotplug shutdown scripts, e.g. ifdown for ethernet.  Yes,
+	 * this is a hack, but if we don't do this, and try to bring
+	 * the device up before the scripts have taken it down,
 	 * potentially weird things happen.
 	 */
 	if (bus) {
@@ -520,7 +520,7 @@ void eeh_handle_event(struct eeh_pe *pe)
 	eeh_pe_dev_traverse(pe, eeh_report_resume, NULL);
 
 	return;
-	
+
 excess_failures:
 	/*
 	 * About 90% of all real-life EEH failures in the field
-- 
1.7.5.4

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 03/27] powerpc/eeh: Make eeh_phb_pe_get() public
  2013-06-15  9:02 [PATCH v4 00/27] EEH Support for PowerNV platform Gavin Shan
  2013-06-15  9:02 ` [PATCH 01/27] powerpc/eeh: Move common part to kernel directory Gavin Shan
  2013-06-15  9:02 ` [PATCH 02/27] powerpc/eeh: Cleanup for EEH core Gavin Shan
@ 2013-06-15  9:02 ` Gavin Shan
  2013-06-15  9:02 ` [PATCH 04/27] powerpc/eeh: Make eeh_pe_get() public Gavin Shan
                   ` (23 subsequent siblings)
  26 siblings, 0 replies; 34+ messages in thread
From: Gavin Shan @ 2013-06-15  9:02 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Gavin Shan

One of the possible cases indicated by P7IOC interrupt is fenced
PHB. For that case, we need fetch the PE corresponding to the PHB
and disable the PHB and all subordinate PCI buses/devices, recover
from the fenced state and eventually enable the whole PHB. We need
one function to fetch the PHB PE outside eeh_pe.c and the patch is
going to make eeh_phb_pe_get() public for that purpose.

Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/eeh.h |    1 +
 arch/powerpc/kernel/eeh_pe.c   |    2 +-
 2 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
index e32c3c5..4ac6f70 100644
--- a/arch/powerpc/include/asm/eeh.h
+++ b/arch/powerpc/include/asm/eeh.h
@@ -184,6 +184,7 @@ static inline void eeh_unlock(void)
 
 typedef void *(*eeh_traverse_func)(void *data, void *flag);
 int eeh_phb_pe_create(struct pci_controller *phb);
+struct eeh_pe *eeh_phb_pe_get(struct pci_controller *phb);
 int eeh_add_to_parent_pe(struct eeh_dev *edev);
 int eeh_rmv_from_parent_pe(struct eeh_dev *edev, int purge_pe);
 void *eeh_pe_dev_traverse(struct eeh_pe *root,
diff --git a/arch/powerpc/kernel/eeh_pe.c b/arch/powerpc/kernel/eeh_pe.c
index 9d4a9e8..71c4544 100644
--- a/arch/powerpc/kernel/eeh_pe.c
+++ b/arch/powerpc/kernel/eeh_pe.c
@@ -95,7 +95,7 @@ int eeh_phb_pe_create(struct pci_controller *phb)
  * hierarchy tree is composed of PHB PEs. The function is used
  * to retrieve the corresponding PHB PE according to the given PHB.
  */
-static struct eeh_pe *eeh_phb_pe_get(struct pci_controller *phb)
+struct eeh_pe *eeh_phb_pe_get(struct pci_controller *phb)
 {
 	struct eeh_pe *pe;
 
-- 
1.7.5.4

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 04/27] powerpc/eeh: Make eeh_pe_get() public
  2013-06-15  9:02 [PATCH v4 00/27] EEH Support for PowerNV platform Gavin Shan
                   ` (2 preceding siblings ...)
  2013-06-15  9:02 ` [PATCH 03/27] powerpc/eeh: Make eeh_phb_pe_get() public Gavin Shan
@ 2013-06-15  9:02 ` Gavin Shan
  2013-06-15  9:02 ` [PATCH 05/27] powerpc/eeh: Trace PCI bus from PE Gavin Shan
                   ` (22 subsequent siblings)
  26 siblings, 0 replies; 34+ messages in thread
From: Gavin Shan @ 2013-06-15  9:02 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Gavin Shan

While processing EEH event interrupt from P7IOC, we need function
to retrieve the PE according to the indicated EEH device. The patch
makes function eeh_pe_get() public so that other source files can call
it for that purpose. Also, the patch fixes referring to wrong BDF
(Bus/Device/Function) address while searching PE in function
__eeh_pe_get().

Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/eeh.h |    1 +
 arch/powerpc/kernel/eeh_pe.c   |    4 ++--
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
index 4ac6f70..acdfcaa 100644
--- a/arch/powerpc/include/asm/eeh.h
+++ b/arch/powerpc/include/asm/eeh.h
@@ -185,6 +185,7 @@ static inline void eeh_unlock(void)
 typedef void *(*eeh_traverse_func)(void *data, void *flag);
 int eeh_phb_pe_create(struct pci_controller *phb);
 struct eeh_pe *eeh_phb_pe_get(struct pci_controller *phb);
+struct eeh_pe *eeh_pe_get(struct eeh_dev *edev);
 int eeh_add_to_parent_pe(struct eeh_dev *edev);
 int eeh_rmv_from_parent_pe(struct eeh_dev *edev, int purge_pe);
 void *eeh_pe_dev_traverse(struct eeh_pe *root,
diff --git a/arch/powerpc/kernel/eeh_pe.c b/arch/powerpc/kernel/eeh_pe.c
index 71c4544..3d2dcf5 100644
--- a/arch/powerpc/kernel/eeh_pe.c
+++ b/arch/powerpc/kernel/eeh_pe.c
@@ -228,7 +228,7 @@ static void *__eeh_pe_get(void *data, void *flag)
 		return pe;
 
 	/* Try BDF address */
-	if (edev->pe_config_addr &&
+	if (edev->config_addr &&
 	   (edev->config_addr == pe->config_addr))
 		return pe;
 
@@ -246,7 +246,7 @@ static void *__eeh_pe_get(void *data, void *flag)
  * which is composed of PCI bus/device/function number, or unified
  * PE address.
  */
-static struct eeh_pe *eeh_pe_get(struct eeh_dev *edev)
+struct eeh_pe *eeh_pe_get(struct eeh_dev *edev)
 {
 	struct eeh_pe *root = eeh_phb_pe_get(edev->phb);
 	struct eeh_pe *pe;
-- 
1.7.5.4

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 05/27] powerpc/eeh: Trace PCI bus from PE
  2013-06-15  9:02 [PATCH v4 00/27] EEH Support for PowerNV platform Gavin Shan
                   ` (3 preceding siblings ...)
  2013-06-15  9:02 ` [PATCH 04/27] powerpc/eeh: Make eeh_pe_get() public Gavin Shan
@ 2013-06-15  9:02 ` Gavin Shan
  2013-06-15  9:02 ` [PATCH 06/27] powerpc/eeh: Make eeh_init() public Gavin Shan
                   ` (21 subsequent siblings)
  26 siblings, 0 replies; 34+ messages in thread
From: Gavin Shan @ 2013-06-15  9:02 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Gavin Shan

There're several types of PEs can be supported for now: PHB, Bus
and Device dependent PE. For PCI bus dependent PE, tracing the
corresponding PCI bus from PE (struct eeh_pe) would make the code
more efficient. The patch also enables the retrieval of PCI bus based
on the PCI bus dependent PE.

Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/eeh.h |    1 +
 arch/powerpc/kernel/eeh_pe.c   |   22 ++++++++++++++++++++++
 2 files changed, 23 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
index acdfcaa..f3b49d6 100644
--- a/arch/powerpc/include/asm/eeh.h
+++ b/arch/powerpc/include/asm/eeh.h
@@ -59,6 +59,7 @@ struct eeh_pe {
 	int config_addr;		/* Traditional PCI address	*/
 	int addr;			/* PE configuration address	*/
 	struct pci_controller *phb;	/* Associated PHB		*/
+	struct pci_bus *bus;		/* Top PCI bus for bus PE	*/
 	int check_count;		/* Times of ignored error	*/
 	int freeze_count;		/* Times of froze up		*/
 	int false_positives;		/* Times of reported #ff's	*/
diff --git a/arch/powerpc/kernel/eeh_pe.c b/arch/powerpc/kernel/eeh_pe.c
index 3d2dcf5..5bd1637 100644
--- a/arch/powerpc/kernel/eeh_pe.c
+++ b/arch/powerpc/kernel/eeh_pe.c
@@ -304,6 +304,7 @@ static struct eeh_pe *eeh_pe_get_parent(struct eeh_dev *edev)
 int eeh_add_to_parent_pe(struct eeh_dev *edev)
 {
 	struct eeh_pe *pe, *parent;
+	struct eeh_dev *first_edev;
 
 	eeh_lock();
 
@@ -326,6 +327,21 @@ int eeh_add_to_parent_pe(struct eeh_dev *edev)
 		pe->type = EEH_PE_BUS;
 		edev->pe = pe;
 
+		/*
+		 * For PCI bus sensitive PE, we can reset the parent
+		 * bridge in order for hot-reset. However, the PCI
+		 * devices including the associated EEH devices might
+		 * be removed when EEH core is doing recovery. So that
+		 * won't safe to retrieve the bridge through downstream
+		 * EEH device. We have to trace the parent PCI bus, then
+		 * the parent bridge explicitly.
+		 */
+		if (eeh_probe_mode_dev() && !pe->bus) {
+			first_edev = list_first_entry(&pe->edevs,
+						      struct eeh_dev, list);
+			pe->bus = eeh_dev_to_pci_dev(first_edev)->bus;
+		}
+
 		/* Put the edev to PE */
 		list_add_tail(&edev->list, &pe->edevs);
 		eeh_unlock();
@@ -641,12 +657,18 @@ struct pci_bus *eeh_pe_bus_get(struct eeh_pe *pe)
 		bus = pe->phb->bus;
 	} else if (pe->type & EEH_PE_BUS ||
 		   pe->type & EEH_PE_DEVICE) {
+		if (pe->bus) {
+			bus = pe->bus;
+			goto out;
+		}
+
 		edev = list_first_entry(&pe->edevs, struct eeh_dev, list);
 		pdev = eeh_dev_to_pci_dev(edev);
 		if (pdev)
 			bus = pdev->bus;
 	}
 
+out:
 	eeh_unlock();
 
 	return bus;
-- 
1.7.5.4

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 06/27] powerpc/eeh: Make eeh_init() public
  2013-06-15  9:02 [PATCH v4 00/27] EEH Support for PowerNV platform Gavin Shan
                   ` (4 preceding siblings ...)
  2013-06-15  9:02 ` [PATCH 05/27] powerpc/eeh: Trace PCI bus from PE Gavin Shan
@ 2013-06-15  9:02 ` Gavin Shan
  2013-06-15  9:02 ` [PATCH 07/27] powerpc/eeh: EEH post initialization operation Gavin Shan
                   ` (20 subsequent siblings)
  26 siblings, 0 replies; 34+ messages in thread
From: Gavin Shan @ 2013-06-15  9:02 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Gavin Shan

For EEH on PowerNV platform, we will do EEH probe based on the
real PCI devices. The PCI devices are available after PCI probe.
So we have to call eeh_init() explicitly on PowerNV platform
after PCI probe. The patch also does EEH probe for PowerNV platform
in eeh_init().

Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/eeh.h |    8 +++++++-
 arch/powerpc/kernel/eeh.c      |   22 ++++++++++++++++++++--
 2 files changed, 27 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
index f3b49d6..beb3cbc 100644
--- a/arch/powerpc/include/asm/eeh.h
+++ b/arch/powerpc/include/asm/eeh.h
@@ -132,7 +132,7 @@ struct eeh_ops {
 	char *name;
 	int (*init)(void);
 	void* (*of_probe)(struct device_node *dn, void *flag);
-	void* (*dev_probe)(struct pci_dev *dev, void *flag);
+	int (*dev_probe)(struct pci_dev *dev, void *flag);
 	int (*set_option)(struct eeh_pe *pe, int option);
 	int (*get_pe_addr)(struct eeh_pe *pe);
 	int (*get_state)(struct eeh_pe *pe, int *state);
@@ -196,6 +196,7 @@ struct pci_bus *eeh_pe_bus_get(struct eeh_pe *pe);
 
 void *eeh_dev_init(struct device_node *dn, void *data);
 void eeh_dev_phb_init_dynamic(struct pci_controller *phb);
+int __init eeh_init(void);
 int __init eeh_ops_register(struct eeh_ops *ops);
 int __exit eeh_ops_unregister(const char *name);
 unsigned long eeh_check_failure(const volatile void __iomem *token,
@@ -224,6 +225,11 @@ void eeh_remove_bus_device(struct pci_dev *, int);
 
 #else /* !CONFIG_EEH */
 
+static inline int eeh_init(void)
+{
+	return 0;
+}
+
 static inline void *eeh_dev_init(struct device_node *dn, void *data)
 {
 	return NULL;
diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
index 8a83451..c865c5f 100644
--- a/arch/powerpc/kernel/eeh.c
+++ b/arch/powerpc/kernel/eeh.c
@@ -674,11 +674,21 @@ int __exit eeh_ops_unregister(const char *name)
  * Even if force-off is set, the EEH hardware is still enabled, so that
  * newer systems can boot.
  */
-static int __init eeh_init(void)
+int __init eeh_init(void)
 {
 	struct pci_controller *hose, *tmp;
 	struct device_node *phb;
-	int ret;
+	static int cnt = 0;
+	int ret = 0;
+
+	/*
+	 * We have to delay the initialization on PowerNV after
+	 * the PCI hierarchy tree has been built because the PEs
+	 * are figured out based on PCI devices instead of device
+	 * tree nodes
+	 */
+	if (machine_is(powernv) && cnt++ <= 0)
+		return ret;
 
 	/* call platform initialization function */
 	if (!eeh_ops) {
@@ -700,6 +710,14 @@ static int __init eeh_init(void)
 			phb = hose->dn;
 			traverse_pci_devices(phb, eeh_ops->of_probe, NULL);
 		}
+	} else if (eeh_probe_mode_dev()) {
+		list_for_each_entry_safe(hose, tmp,
+			&hose_list, list_node)
+			pci_walk_bus(hose->bus, eeh_ops->dev_probe, NULL);
+	} else {
+		pr_warning("%s: Invalid probe mode %d\n",
+			   __func__, eeh_probe_mode);
+		return -EINVAL;
 	}
 
 	if (eeh_subsystem_enabled)
-- 
1.7.5.4

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 07/27] powerpc/eeh: EEH post initialization operation
  2013-06-15  9:02 [PATCH v4 00/27] EEH Support for PowerNV platform Gavin Shan
                   ` (5 preceding siblings ...)
  2013-06-15  9:02 ` [PATCH 06/27] powerpc/eeh: Make eeh_init() public Gavin Shan
@ 2013-06-15  9:02 ` Gavin Shan
  2013-06-15  9:02 ` [PATCH 08/27] powerpc/eeh: Refactor eeh_reset_pe_once() Gavin Shan
                   ` (19 subsequent siblings)
  26 siblings, 0 replies; 34+ messages in thread
From: Gavin Shan @ 2013-06-15  9:02 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Gavin Shan

The patch adds new EEH operation post_init. It's used to notify
the platform that EEH core has completed the EEH probe. By that,
PowerNV platform starts to use the services supplied by EEH
functionality.

Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/eeh.h |    1 +
 arch/powerpc/kernel/eeh.c      |   11 +++++++++++
 2 files changed, 12 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
index beb3cbc..beec788 100644
--- a/arch/powerpc/include/asm/eeh.h
+++ b/arch/powerpc/include/asm/eeh.h
@@ -131,6 +131,7 @@ static inline struct pci_dev *eeh_dev_to_pci_dev(struct eeh_dev *edev)
 struct eeh_ops {
 	char *name;
 	int (*init)(void);
+	int (*post_init)(void);
 	void* (*of_probe)(struct device_node *dn, void *flag);
 	int (*dev_probe)(struct pci_dev *dev, void *flag);
 	int (*set_option)(struct eeh_pe *pe, int option);
diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
index c865c5f..a29cf47 100644
--- a/arch/powerpc/kernel/eeh.c
+++ b/arch/powerpc/kernel/eeh.c
@@ -720,6 +720,17 @@ int __init eeh_init(void)
 		return -EINVAL;
 	}
 
+	/*
+	 * Call platform post-initialization. Actually, It's good chance
+	 * to inform platform that EEH is ready to supply service if the
+	 * I/O cache stuff has been built up.
+	 */
+	if (eeh_ops->post_init) {
+		ret = eeh_ops->post_init();
+		if (ret)
+			return ret;
+	}
+
 	if (eeh_subsystem_enabled)
 		pr_info("EEH: PCI Enhanced I/O Error Handling Enabled\n");
 	else
-- 
1.7.5.4

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 08/27] powerpc/eeh: Refactor eeh_reset_pe_once()
  2013-06-15  9:02 [PATCH v4 00/27] EEH Support for PowerNV platform Gavin Shan
                   ` (6 preceding siblings ...)
  2013-06-15  9:02 ` [PATCH 07/27] powerpc/eeh: EEH post initialization operation Gavin Shan
@ 2013-06-15  9:02 ` Gavin Shan
  2013-06-15  9:03 ` [PATCH 09/27] powerpc/eeh: Delay EEH probe during hotplug Gavin Shan
                   ` (18 subsequent siblings)
  26 siblings, 0 replies; 34+ messages in thread
From: Gavin Shan @ 2013-06-15  9:02 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Gavin Shan

We shouldn't check that the returned PE status is exactly equal to
(EEH_STATE_MMIO_ACTIVE | EEH_STATE_DMA_ACTIVE) but instead only check
that they are both set.

[benh: changelog]
Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
---
 arch/powerpc/kernel/eeh.c |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
index a29cf47..cda0b62 100644
--- a/arch/powerpc/kernel/eeh.c
+++ b/arch/powerpc/kernel/eeh.c
@@ -565,6 +565,7 @@ static void eeh_reset_pe_once(struct eeh_pe *pe)
  */
 int eeh_reset_pe(struct eeh_pe *pe)
 {
+	int flags = (EEH_STATE_MMIO_ACTIVE | EEH_STATE_DMA_ACTIVE);
 	int i, rc;
 
 	/* Take three shots at resetting the bus */
@@ -572,7 +573,7 @@ int eeh_reset_pe(struct eeh_pe *pe)
 		eeh_reset_pe_once(pe);
 
 		rc = eeh_ops->wait_state(pe, PCI_BUS_RESET_WAIT_MSEC);
-		if (rc == (EEH_STATE_MMIO_ACTIVE | EEH_STATE_DMA_ACTIVE))
+		if ((rc & flags) == flags)
 			return 0;
 
 		if (rc < 0) {
-- 
1.7.5.4

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 09/27] powerpc/eeh: Delay EEH probe during hotplug
  2013-06-15  9:02 [PATCH v4 00/27] EEH Support for PowerNV platform Gavin Shan
                   ` (7 preceding siblings ...)
  2013-06-15  9:02 ` [PATCH 08/27] powerpc/eeh: Refactor eeh_reset_pe_once() Gavin Shan
@ 2013-06-15  9:03 ` Gavin Shan
  2013-06-15  9:03 ` [PATCH 10/27] powerpc/eeh: Export confirm_error_lock Gavin Shan
                   ` (17 subsequent siblings)
  26 siblings, 0 replies; 34+ messages in thread
From: Gavin Shan @ 2013-06-15  9:03 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Gavin Shan

While doing EEH recovery, the PCI devices of the problematic PE
should be removed and then added to the system again. During the
so-called hotplug event, the PCI devices of the problematic PE
will be probed through early/late phase. We would delay EEH probe
on late point for PowerNV platform since the PCI device isn't
available in early phase.

Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
---
 arch/powerpc/kernel/eeh.c |   16 +++++++++++++++-
 1 files changed, 15 insertions(+), 1 deletions(-)

diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
index cda0b62..7d169d3 100644
--- a/arch/powerpc/kernel/eeh.c
+++ b/arch/powerpc/kernel/eeh.c
@@ -758,6 +758,14 @@ static void eeh_add_device_early(struct device_node *dn)
 {
 	struct pci_controller *phb;
 
+	/*
+	 * If we're doing EEH probe based on PCI device, we
+	 * would delay the probe until late stage because
+	 * the PCI device isn't available this moment.
+	 */
+	if (!eeh_probe_mode_devtree())
+		return;
+
 	if (!of_node_to_eeh_dev(dn))
 		return;
 	phb = of_node_to_eeh_dev(dn)->phb;
@@ -766,7 +774,6 @@ static void eeh_add_device_early(struct device_node *dn)
 	if (NULL == phb || 0 == phb->buid)
 		return;
 
-	/* FIXME: hotplug support on POWERNV */
 	eeh_ops->of_probe(dn, NULL);
 }
 
@@ -817,6 +824,13 @@ static void eeh_add_device_late(struct pci_dev *dev)
 	edev->pdev = dev;
 	dev->dev.archdata.edev = edev;
 
+	/*
+	 * We have to do the EEH probe here because the PCI device
+	 * hasn't been created yet in the early stage.
+	 */
+	if (eeh_probe_mode_dev())
+		eeh_ops->dev_probe(dev, NULL);
+
 	eeh_addr_cache_insert_dev(dev);
 }
 
-- 
1.7.5.4

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 10/27] powerpc/eeh: Export confirm_error_lock
  2013-06-15  9:02 [PATCH v4 00/27] EEH Support for PowerNV platform Gavin Shan
                   ` (8 preceding siblings ...)
  2013-06-15  9:03 ` [PATCH 09/27] powerpc/eeh: Delay EEH probe during hotplug Gavin Shan
@ 2013-06-15  9:03 ` Gavin Shan
  2013-06-15  9:03 ` [PATCH 11/27] powerpc/eeh: Sync OPAL API with firmware Gavin Shan
                   ` (16 subsequent siblings)
  26 siblings, 0 replies; 34+ messages in thread
From: Gavin Shan @ 2013-06-15  9:03 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Gavin Shan

An EEH event is created and queued to the event queue for each
ingress EEH error. When there're mutiple EEH errors, we need serialize
the process to keep consistent PE state (flags). The spinlock
"confirm_error_lock" was introduced for the purpose. We'll inject
EEH event upon error reporting interrupts on PowerNV platform. So
we export the spinlock for that to use for consistent PE state.

Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/eeh.h |   11 +++++++++++
 arch/powerpc/kernel/eeh.c      |   10 ++++------
 2 files changed, 15 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
index beec788..7ebf522 100644
--- a/arch/powerpc/include/asm/eeh.h
+++ b/arch/powerpc/include/asm/eeh.h
@@ -148,6 +148,7 @@ struct eeh_ops {
 extern struct eeh_ops *eeh_ops;
 extern int eeh_subsystem_enabled;
 extern struct mutex eeh_mutex;
+extern raw_spinlock_t confirm_error_lock;
 extern int eeh_probe_mode;
 
 #define EEH_PROBE_MODE_DEV	(1<<0)	/* From PCI device	*/
@@ -178,6 +179,16 @@ static inline void eeh_unlock(void)
 	mutex_unlock(&eeh_mutex);
 }
 
+static inline void eeh_serialize_lock(unsigned long *flags)
+{
+	raw_spin_lock_irqsave(&confirm_error_lock, *flags);
+}
+
+static inline void eeh_serialize_unlock(unsigned long flags)
+{
+	raw_spin_unlock_irqrestore(&confirm_error_lock, flags);
+}
+
 /*
  * Max number of EEH freezes allowed before we consider the device
  * to be permanently disabled.
diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
index 7d169d3..f7cbeae 100644
--- a/arch/powerpc/kernel/eeh.c
+++ b/arch/powerpc/kernel/eeh.c
@@ -107,7 +107,7 @@ int eeh_probe_mode;
 DEFINE_MUTEX(eeh_mutex);
 
 /* Lock to avoid races due to multiple reports of an error */
-static DEFINE_RAW_SPINLOCK(confirm_error_lock);
+DEFINE_RAW_SPINLOCK(confirm_error_lock);
 
 /* Buffer for reporting pci register dumps. Its here in BSS, and
  * not dynamically alloced, so that it ends up in RMO where RTAS
@@ -325,7 +325,7 @@ int eeh_dev_check_failure(struct eeh_dev *edev)
 	 * in one slot might report errors simultaneously, and we
 	 * only want one error recovery routine running.
 	 */
-	raw_spin_lock_irqsave(&confirm_error_lock, flags);
+	eeh_serialize_lock(&flags);
 	rc = 1;
 	if (pe->state & EEH_PE_ISOLATED) {
 		pe->check_count++;
@@ -374,7 +374,7 @@ int eeh_dev_check_failure(struct eeh_dev *edev)
 	 * bridges.
 	 */
 	eeh_pe_state_mark(pe, EEH_PE_ISOLATED);
-	raw_spin_unlock_irqrestore(&confirm_error_lock, flags);
+	eeh_serialize_unlock(flags);
 
 	eeh_send_failure_event(pe);
 
@@ -386,7 +386,7 @@ int eeh_dev_check_failure(struct eeh_dev *edev)
 	return 1;
 
 dn_unlock:
-	raw_spin_unlock_irqrestore(&confirm_error_lock, flags);
+	eeh_serialize_unlock(flags);
 	return rc;
 }
 
@@ -702,8 +702,6 @@ int __init eeh_init(void)
 		return ret;
 	}
 
-	raw_spin_lock_init(&confirm_error_lock);
-
 	/* Enable EEH for all adapters */
 	if (eeh_probe_mode_devtree()) {
 		list_for_each_entry_safe(hose, tmp,
-- 
1.7.5.4

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 11/27] powerpc/eeh: Sync OPAL API with firmware
  2013-06-15  9:02 [PATCH v4 00/27] EEH Support for PowerNV platform Gavin Shan
                   ` (9 preceding siblings ...)
  2013-06-15  9:03 ` [PATCH 10/27] powerpc/eeh: Export confirm_error_lock Gavin Shan
@ 2013-06-15  9:03 ` Gavin Shan
  2013-06-15  9:03 ` [PATCH 12/27] powerpc/eeh: EEH backend for P7IOC Gavin Shan
                   ` (15 subsequent siblings)
  26 siblings, 0 replies; 34+ messages in thread
From: Gavin Shan @ 2013-06-15  9:03 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Gavin Shan

The patch synchronizes OPAL APIs between kernel and firmware. Also,
we starts to replace opal_pci_get_phb_diag_data() with the similar
opal_pci_get_phb_diag_data2() and the former OPAL API would return
OPAL_UNSUPPORTED from now on.

Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/opal.h                |  135 ++++++++++++++++++++----
 arch/powerpc/platforms/powernv/opal-wrappers.S |    3 +
 arch/powerpc/platforms/powernv/pci.c           |    3 +-
 3 files changed, 119 insertions(+), 22 deletions(-)

diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
index cbb9305..2880797 100644
--- a/arch/powerpc/include/asm/opal.h
+++ b/arch/powerpc/include/asm/opal.h
@@ -117,7 +117,13 @@ extern int opal_enter_rtas(struct rtas_args *args,
 #define OPAL_SET_SLOT_LED_STATUS		55
 #define OPAL_GET_EPOW_STATUS			56
 #define OPAL_SET_SYSTEM_ATTENTION_LED		57
+#define OPAL_RESERVED1				58
+#define OPAL_RESERVED2				59
+#define OPAL_PCI_NEXT_ERROR			60
+#define OPAL_PCI_EEH_FREEZE_STATUS2		61
+#define OPAL_PCI_POLL				62
 #define OPAL_PCI_MSI_EOI			63
+#define OPAL_PCI_GET_PHB_DIAG_DATA2		64
 
 #ifndef __ASSEMBLY__
 
@@ -125,6 +131,7 @@ extern int opal_enter_rtas(struct rtas_args *args,
 enum OpalVendorApiTokens {
 	OPAL_START_VENDOR_API_RANGE = 1000, OPAL_END_VENDOR_API_RANGE = 1999
 };
+
 enum OpalFreezeState {
 	OPAL_EEH_STOPPED_NOT_FROZEN = 0,
 	OPAL_EEH_STOPPED_MMIO_FREEZE = 1,
@@ -134,55 +141,69 @@ enum OpalFreezeState {
 	OPAL_EEH_STOPPED_TEMP_UNAVAIL = 5,
 	OPAL_EEH_STOPPED_PERM_UNAVAIL = 6
 };
+
 enum OpalEehFreezeActionToken {
 	OPAL_EEH_ACTION_CLEAR_FREEZE_MMIO = 1,
 	OPAL_EEH_ACTION_CLEAR_FREEZE_DMA = 2,
 	OPAL_EEH_ACTION_CLEAR_FREEZE_ALL = 3
 };
+
 enum OpalPciStatusToken {
-	OPAL_EEH_PHB_NO_ERROR = 0,
-	OPAL_EEH_PHB_FATAL = 1,
-	OPAL_EEH_PHB_RECOVERABLE = 2,
-	OPAL_EEH_PHB_BUS_ERROR = 3,
-	OPAL_EEH_PCI_NO_DEVSEL = 4,
-	OPAL_EEH_PCI_TA = 5,
-	OPAL_EEH_PCIEX_UR = 6,
-	OPAL_EEH_PCIEX_CA = 7,
-	OPAL_EEH_PCI_MMIO_ERROR = 8,
-	OPAL_EEH_PCI_DMA_ERROR = 9
+	OPAL_EEH_NO_ERROR	= 0,
+	OPAL_EEH_IOC_ERROR	= 1,
+	OPAL_EEH_PHB_ERROR	= 2,
+	OPAL_EEH_PE_ERROR	= 3,
+	OPAL_EEH_PE_MMIO_ERROR	= 4,
+	OPAL_EEH_PE_DMA_ERROR	= 5
 };
+
+enum OpalPciErrorSeverity {
+	OPAL_EEH_SEV_NO_ERROR	= 0,
+	OPAL_EEH_SEV_IOC_DEAD	= 1,
+	OPAL_EEH_SEV_PHB_DEAD	= 2,
+	OPAL_EEH_SEV_PHB_FENCED	= 3,
+	OPAL_EEH_SEV_PE_ER	= 4,
+	OPAL_EEH_SEV_INF	= 5
+};
+
 enum OpalShpcAction {
 	OPAL_SHPC_GET_LINK_STATE = 0,
 	OPAL_SHPC_GET_SLOT_STATE = 1
 };
+
 enum OpalShpcLinkState {
 	OPAL_SHPC_LINK_DOWN = 0,
 	OPAL_SHPC_LINK_UP = 1
 };
+
 enum OpalMmioWindowType {
 	OPAL_M32_WINDOW_TYPE = 1,
 	OPAL_M64_WINDOW_TYPE = 2,
 	OPAL_IO_WINDOW_TYPE = 3
 };
+
 enum OpalShpcSlotState {
 	OPAL_SHPC_DEV_NOT_PRESENT = 0,
 	OPAL_SHPC_DEV_PRESENT = 1
 };
+
 enum OpalExceptionHandler {
 	OPAL_MACHINE_CHECK_HANDLER = 1,
 	OPAL_HYPERVISOR_MAINTENANCE_HANDLER = 2,
 	OPAL_SOFTPATCH_HANDLER = 3
 };
+
 enum OpalPendingState {
-	OPAL_EVENT_OPAL_INTERNAL = 0x1,
-	OPAL_EVENT_NVRAM = 0x2,
-	OPAL_EVENT_RTC = 0x4,
-	OPAL_EVENT_CONSOLE_OUTPUT = 0x8,
-	OPAL_EVENT_CONSOLE_INPUT = 0x10,
-	OPAL_EVENT_ERROR_LOG_AVAIL = 0x20,
-	OPAL_EVENT_ERROR_LOG = 0x40,
-	OPAL_EVENT_EPOW = 0x80,
-	OPAL_EVENT_LED_STATUS = 0x100
+	OPAL_EVENT_OPAL_INTERNAL	= 0x1,
+	OPAL_EVENT_NVRAM		= 0x2,
+	OPAL_EVENT_RTC			= 0x4,
+	OPAL_EVENT_CONSOLE_OUTPUT	= 0x8,
+	OPAL_EVENT_CONSOLE_INPUT	= 0x10,
+	OPAL_EVENT_ERROR_LOG_AVAIL	= 0x20,
+	OPAL_EVENT_ERROR_LOG		= 0x40,
+	OPAL_EVENT_EPOW			= 0x80,
+	OPAL_EVENT_LED_STATUS		= 0x100,
+	OPAL_EVENT_PCI_ERROR		= 0x200
 };
 
 /* Machine check related definitions */
@@ -364,15 +385,80 @@ struct opal_machine_check_event {
 	} u;
 };
 
+enum {
+	OPAL_P7IOC_DIAG_TYPE_NONE	= 0,
+	OPAL_P7IOC_DIAG_TYPE_RGC	= 1,
+	OPAL_P7IOC_DIAG_TYPE_BI		= 2,
+	OPAL_P7IOC_DIAG_TYPE_CI		= 3,
+	OPAL_P7IOC_DIAG_TYPE_MISC	= 4,
+	OPAL_P7IOC_DIAG_TYPE_I2C	= 5,
+	OPAL_P7IOC_DIAG_TYPE_LAST	= 6
+};
+
+struct OpalIoP7IOCErrorData {
+	uint16_t type;
+
+	/* GEM */
+	uint64_t gemXfir;
+	uint64_t gemRfir;
+	uint64_t gemRirqfir;
+	uint64_t gemMask;
+	uint64_t gemRwof;
+
+	/* LEM */
+	uint64_t lemFir;
+	uint64_t lemErrMask;
+	uint64_t lemAction0;
+	uint64_t lemAction1;
+	uint64_t lemWof;
+
+	union {
+		struct OpalIoP7IOCRgcErrorData {
+			uint64_t rgcStatus;		/* 3E1C10 */
+			uint64_t rgcLdcp;		/* 3E1C18 */
+		}rgc;
+		struct OpalIoP7IOCBiErrorData {
+			uint64_t biLdcp0;		/* 3C0100, 3C0118 */
+			uint64_t biLdcp1;		/* 3C0108, 3C0120 */
+			uint64_t biLdcp2;		/* 3C0110, 3C0128 */
+			uint64_t biFenceStatus;		/* 3C0130, 3C0130 */
+
+			uint8_t  biDownbound;		/* BI Downbound or Upbound */
+		}bi;
+		struct OpalIoP7IOCCiErrorData {
+			uint64_t ciPortStatus;		/* 3Dn008 */
+			uint64_t ciPortLdcp;		/* 3Dn010 */
+
+			uint8_t	 ciPort;		/* Index of CI port: 0/1 */
+		}ci;
+	};
+};
+
 /**
  * This structure defines the overlay which will be used to store PHB error
  * data upon request.
  */
 enum {
+	OPAL_PHB_ERROR_DATA_VERSION_1 = 1,
+};
+
+enum {
+	OPAL_PHB_ERROR_DATA_TYPE_P7IOC = 1,
+};
+
+enum {
 	OPAL_P7IOC_NUM_PEST_REGS = 128,
 };
 
+struct OpalIoPhbErrorCommon {
+	uint32_t version;
+	uint32_t ioType;
+	uint32_t len;
+};
+
 struct OpalIoP7IOCPhbErrorData {
+	struct OpalIoPhbErrorCommon common;
+
 	uint32_t brdgCtl;
 
 	// P7IOC utl regs
@@ -530,14 +616,21 @@ int64_t opal_pci_map_pe_dma_window_real(uint64_t phb_id, uint16_t pe_number,
 					uint64_t pci_mem_size);
 int64_t opal_pci_reset(uint64_t phb_id, uint8_t reset_scope, uint8_t assert_state);
 
-int64_t opal_pci_get_hub_diag_data(uint64_t hub_id, void *diag_buffer, uint64_t diag_buffer_len);
-int64_t opal_pci_get_phb_diag_data(uint64_t phb_id, void *diag_buffer, uint64_t diag_buffer_len);
+int64_t opal_pci_get_hub_diag_data(uint64_t hub_id, void *diag_buffer,
+				   uint64_t diag_buffer_len);
+int64_t opal_pci_get_phb_diag_data(uint64_t phb_id, void *diag_buffer,
+				   uint64_t diag_buffer_len);
+int64_t opal_pci_get_phb_diag_data2(uint64_t phb_id, void *diag_buffer,
+				    uint64_t diag_buffer_len);
 int64_t opal_pci_fence_phb(uint64_t phb_id);
 int64_t opal_pci_reinit(uint64_t phb_id, uint8_t reinit_scope);
 int64_t opal_pci_mask_pe_error(uint64_t phb_id, uint16_t pe_number, uint8_t error_type, uint8_t mask_action);
 int64_t opal_set_slot_led_status(uint64_t phb_id, uint64_t slot_id, uint8_t led_type, uint8_t led_action);
 int64_t opal_get_epow_status(uint64_t *status);
 int64_t opal_set_system_attention_led(uint8_t led_action);
+int64_t opal_pci_next_error(uint64_t phb_id, uint64_t *first_frozen_pe,
+			    uint16_t *pci_error_type, uint16_t *severity);
+int64_t opal_pci_poll(uint64_t phb_id);
 
 /* Internal functions */
 extern int early_init_dt_scan_opal(unsigned long node, const char *uname, int depth, void *data);
diff --git a/arch/powerpc/platforms/powernv/opal-wrappers.S b/arch/powerpc/platforms/powernv/opal-wrappers.S
index 6fabe92..e88863f 100644
--- a/arch/powerpc/platforms/powernv/opal-wrappers.S
+++ b/arch/powerpc/platforms/powernv/opal-wrappers.S
@@ -107,4 +107,7 @@ OPAL_CALL(opal_pci_mask_pe_error,		OPAL_PCI_MASK_PE_ERROR);
 OPAL_CALL(opal_set_slot_led_status,		OPAL_SET_SLOT_LED_STATUS);
 OPAL_CALL(opal_get_epow_status,			OPAL_GET_EPOW_STATUS);
 OPAL_CALL(opal_set_system_attention_led,	OPAL_SET_SYSTEM_ATTENTION_LED);
+OPAL_CALL(opal_pci_next_error,			OPAL_PCI_NEXT_ERROR);
+OPAL_CALL(opal_pci_poll,			OPAL_PCI_POLL);
 OPAL_CALL(opal_pci_msi_eoi,			OPAL_PCI_MSI_EOI);
+OPAL_CALL(opal_pci_get_phb_diag_data2,		OPAL_PCI_GET_PHB_DIAG_DATA2);
diff --git a/arch/powerpc/platforms/powernv/pci.c b/arch/powerpc/platforms/powernv/pci.c
index 277343c..20af220 100644
--- a/arch/powerpc/platforms/powernv/pci.c
+++ b/arch/powerpc/platforms/powernv/pci.c
@@ -202,7 +202,8 @@ static void pnv_pci_handle_eeh_config(struct pnv_phb *phb, u32 pe_no)
 
 	spin_lock_irqsave(&phb->lock, flags);
 
-	rc = opal_pci_get_phb_diag_data(phb->opal_id, phb->diag.blob, PNV_PCI_DIAG_BUF_SIZE);
+	rc = opal_pci_get_phb_diag_data2(phb->opal_id, phb->diag.blob,
+					 PNV_PCI_DIAG_BUF_SIZE);
 	has_diag = (rc == OPAL_SUCCESS);
 
 	rc = opal_pci_eeh_freeze_clear(phb->opal_id, pe_no,
-- 
1.7.5.4

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 12/27] powerpc/eeh: EEH backend for P7IOC
  2013-06-15  9:02 [PATCH v4 00/27] EEH Support for PowerNV platform Gavin Shan
                   ` (10 preceding siblings ...)
  2013-06-15  9:03 ` [PATCH 11/27] powerpc/eeh: Sync OPAL API with firmware Gavin Shan
@ 2013-06-15  9:03 ` Gavin Shan
  2013-06-15  9:03 ` [PATCH 13/27] powerpc/eeh: I/O chip post initialization Gavin Shan
                   ` (14 subsequent siblings)
  26 siblings, 0 replies; 34+ messages in thread
From: Gavin Shan @ 2013-06-15  9:03 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Gavin Shan

For EEH on PowerNV platform, the overall architecture is different
from that on pSeries platform. In order to support multiple I/O chips
in future, we split EEH to 3 layers for PowerNV platform: EEH core,
platform layer, I/O layer. It would give EEH implementation on PowerNV
platform much more flexibility in future.

The patch adds the EEH backend for P7IOC.

Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/powernv/Makefile   |    1 +
 arch/powerpc/platforms/powernv/eeh-ioda.c |   44 +++++++++++++++++++++++++++++
 arch/powerpc/platforms/powernv/pci.h      |   22 ++++++++++++++
 3 files changed, 67 insertions(+), 0 deletions(-)
 create mode 100644 arch/powerpc/platforms/powernv/eeh-ioda.c

diff --git a/arch/powerpc/platforms/powernv/Makefile b/arch/powerpc/platforms/powernv/Makefile
index bcc3cb4..09bd0cb 100644
--- a/arch/powerpc/platforms/powernv/Makefile
+++ b/arch/powerpc/platforms/powernv/Makefile
@@ -3,3 +3,4 @@ obj-y			+= opal-rtc.o opal-nvram.o
 
 obj-$(CONFIG_SMP)	+= smp.o
 obj-$(CONFIG_PCI)	+= pci.o pci-p5ioc2.o pci-ioda.o
+obj-$(CONFIG_EEH)	+= eeh-ioda.o
diff --git a/arch/powerpc/platforms/powernv/eeh-ioda.c b/arch/powerpc/platforms/powernv/eeh-ioda.c
new file mode 100644
index 0000000..b9564d5
--- /dev/null
+++ b/arch/powerpc/platforms/powernv/eeh-ioda.c
@@ -0,0 +1,44 @@
+/*
+ * The file intends to implement the functions needed by EEH, which is
+ * built on IODA compliant chip. Actually, lots of functions related
+ * to EEH would be built based on the OPAL APIs.
+ *
+ * Copyright Benjamin Herrenschmidt & Gavin Shan, IBM Corporation 2013.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#include <linux/bootmem.h>
+#include <linux/delay.h>
+#include <linux/init.h>
+#include <linux/io.h>
+#include <linux/irq.h>
+#include <linux/kernel.h>
+#include <linux/msi.h>
+#include <linux/pci.h>
+#include <linux/string.h>
+
+#include <asm/eeh.h>
+#include <asm/eeh_event.h>
+#include <asm/io.h>
+#include <asm/iommu.h>
+#include <asm/msi_bitmap.h>
+#include <asm/opal.h>
+#include <asm/pci-bridge.h>
+#include <asm/ppc-pci.h>
+#include <asm/tce.h>
+
+#include "powernv.h"
+#include "pci.h"
+
+struct pnv_eeh_ops ioda_eeh_ops = {
+	.post_init		= NULL,
+	.set_option		= NULL,
+	.get_state		= NULL,
+	.reset			= NULL,
+	.get_log		= NULL,
+	.configure_bridge	= NULL
+};
diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
index 25d76c4..6f69b87 100644
--- a/arch/powerpc/platforms/powernv/pci.h
+++ b/arch/powerpc/platforms/powernv/pci.h
@@ -66,15 +66,34 @@ struct pnv_ioda_pe {
 	struct list_head	list;
 };
 
+/* IOC dependent EEH operations */
+#ifdef CONFIG_EEH
+struct pnv_eeh_ops {
+	int (*post_init)(struct pci_controller *hose);
+	int (*set_option)(struct eeh_pe *pe, int option);
+	int (*get_state)(struct eeh_pe *pe);
+	int (*reset)(struct eeh_pe *pe, int option);
+	int (*get_log)(struct eeh_pe *pe, int severity,
+		       char *drv_log, unsigned long len);
+	int (*configure_bridge)(struct eeh_pe *pe);
+};
+#endif /* CONFIG_EEH */
+
 struct pnv_phb {
 	struct pci_controller	*hose;
 	enum pnv_phb_type	type;
 	enum pnv_phb_model	model;
+	u64			hub_id;
 	u64			opal_id;
 	void __iomem		*regs;
 	int			initialized;
 	spinlock_t		lock;
 
+#ifdef CONFIG_EEH
+	struct pnv_eeh_ops	*eeh_ops;
+	int			eeh_enabled;
+#endif
+
 #ifdef CONFIG_PCI_MSI
 	unsigned int		msi_base;
 	unsigned int		msi32_support;
@@ -150,6 +169,9 @@ struct pnv_phb {
 };
 
 extern struct pci_ops pnv_pci_ops;
+#ifdef CONFIG_EEH
+extern struct pnv_eeh_ops ioda_eeh_ops;
+#endif
 
 extern void pnv_pci_setup_iommu_table(struct iommu_table *tbl,
 				      void *tce_mem, u64 tce_size,
-- 
1.7.5.4

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 13/27] powerpc/eeh: I/O chip post initialization
  2013-06-15  9:02 [PATCH v4 00/27] EEH Support for PowerNV platform Gavin Shan
                   ` (11 preceding siblings ...)
  2013-06-15  9:03 ` [PATCH 12/27] powerpc/eeh: EEH backend for P7IOC Gavin Shan
@ 2013-06-15  9:03 ` Gavin Shan
  2013-06-15  9:03 ` [PATCH 14/27] powerpc/eeh: I/O chip EEH enable option Gavin Shan
                   ` (13 subsequent siblings)
  26 siblings, 0 replies; 34+ messages in thread
From: Gavin Shan @ 2013-06-15  9:03 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Gavin Shan

The post initialization (struct eeh_ops::post_init) is called after
the EEH probe is done. On the other hand, the EEH core post
initialization is designed to call platform and then I/O chip backend
on PowerNV platform.

The patch adds the backend for I/O chip to notify the platform
that the specific PHB is ready to supply EEH service.

Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/powernv/eeh-ioda.c |   21 ++++++++++++++++++++-
 1 files changed, 20 insertions(+), 1 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/eeh-ioda.c b/arch/powerpc/platforms/powernv/eeh-ioda.c
index b9564d5..60ac8fe 100644
--- a/arch/powerpc/platforms/powernv/eeh-ioda.c
+++ b/arch/powerpc/platforms/powernv/eeh-ioda.c
@@ -34,8 +34,27 @@
 #include "powernv.h"
 #include "pci.h"
 
+/**
+ * ioda_eeh_post_init - Chip dependent post initialization
+ * @hose: PCI controller
+ *
+ * The function will be called after eeh PEs and devices
+ * have been built. That means the EEH is ready to supply
+ * service with I/O cache.
+ */
+static int ioda_eeh_post_init(struct pci_controller *hose)
+{
+	struct pnv_phb *phb = hose->private_data;
+
+	/* FIXME: Enable it for PHB3 later */
+	if (phb->type == PNV_PHB_IODA1)
+		phb->eeh_enabled = 1;
+
+	return 0;
+}
+
 struct pnv_eeh_ops ioda_eeh_ops = {
-	.post_init		= NULL,
+	.post_init		= ioda_eeh_post_init,
 	.set_option		= NULL,
 	.get_state		= NULL,
 	.reset			= NULL,
-- 
1.7.5.4

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 14/27] powerpc/eeh: I/O chip EEH enable option
  2013-06-15  9:02 [PATCH v4 00/27] EEH Support for PowerNV platform Gavin Shan
                   ` (12 preceding siblings ...)
  2013-06-15  9:03 ` [PATCH 13/27] powerpc/eeh: I/O chip post initialization Gavin Shan
@ 2013-06-15  9:03 ` Gavin Shan
  2013-06-15  9:03 ` [PATCH 15/27] powerpc/eeh: I/O chip EEH state retrieval Gavin Shan
                   ` (12 subsequent siblings)
  26 siblings, 0 replies; 34+ messages in thread
From: Gavin Shan @ 2013-06-15  9:03 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Gavin Shan

The patch adds the backend to enable or disable EEH functionality
for the specified PE. The backend is also used to enable MMIO or
DMA path for the problematic PE. It's notable that all PEs on
PowerNV platform support EEH functionality by default, and we
disallow to disable EEH for the specific PE.

Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/powernv/eeh-ioda.c |   65 ++++++++++++++++++++++++++++-
 1 files changed, 64 insertions(+), 1 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/eeh-ioda.c b/arch/powerpc/platforms/powernv/eeh-ioda.c
index 60ac8fe..b77e90e 100644
--- a/arch/powerpc/platforms/powernv/eeh-ioda.c
+++ b/arch/powerpc/platforms/powernv/eeh-ioda.c
@@ -53,9 +53,72 @@ static int ioda_eeh_post_init(struct pci_controller *hose)
 	return 0;
 }
 
+/**
+ * ioda_eeh_set_option - Set EEH operation or I/O setting
+ * @pe: EEH PE
+ * @option: options
+ *
+ * Enable or disable EEH option for the indicated PE. The
+ * function also can be used to enable I/O or DMA for the
+ * PE.
+ */
+static int ioda_eeh_set_option(struct eeh_pe *pe, int option)
+{
+	s64 ret;
+	u32 pe_no;
+	struct pci_controller *hose = pe->phb;
+	struct pnv_phb *phb = hose->private_data;
+
+	/* Check on PE number */
+	if (pe->addr < 0 || pe->addr >= phb->ioda.total_pe) {
+		pr_err("%s: PE address %x out of range [0, %x] "
+		       "on PHB#%x\n",
+			__func__, pe->addr, phb->ioda.total_pe,
+			hose->global_number);
+		return -EINVAL;
+	}
+
+	pe_no = pe->addr;
+	switch (option) {
+	case EEH_OPT_DISABLE:
+		ret = -EEXIST;
+		break;
+	case EEH_OPT_ENABLE:
+		ret = 0;
+		break;
+	case EEH_OPT_THAW_MMIO:
+		ret = opal_pci_eeh_freeze_clear(phb->opal_id, pe_no,
+				OPAL_EEH_ACTION_CLEAR_FREEZE_MMIO);
+		if (ret) {
+			pr_warning("%s: Failed to enable MMIO for "
+				   "PHB#%x-PE#%x, err=%lld\n",
+				__func__, hose->global_number, pe_no, ret);
+			return -EIO;
+		}
+
+		break;
+	case EEH_OPT_THAW_DMA:
+		ret = opal_pci_eeh_freeze_clear(phb->opal_id, pe_no,
+				OPAL_EEH_ACTION_CLEAR_FREEZE_DMA);
+		if (ret) {
+			pr_warning("%s: Failed to enable DMA for "
+				   "PHB#%x-PE#%x, err=%lld\n",
+				__func__, hose->global_number, pe_no, ret);
+			return -EIO;
+		}
+
+		break;
+	default:
+		pr_warning("%s: Invalid option %d\n", __func__, option);
+		return -EINVAL;
+	}
+
+	return ret;
+}
+
 struct pnv_eeh_ops ioda_eeh_ops = {
 	.post_init		= ioda_eeh_post_init,
-	.set_option		= NULL,
+	.set_option		= ioda_eeh_set_option,
 	.get_state		= NULL,
 	.reset			= NULL,
 	.get_log		= NULL,
-- 
1.7.5.4

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 15/27] powerpc/eeh: I/O chip EEH state retrieval
  2013-06-15  9:02 [PATCH v4 00/27] EEH Support for PowerNV platform Gavin Shan
                   ` (13 preceding siblings ...)
  2013-06-15  9:03 ` [PATCH 14/27] powerpc/eeh: I/O chip EEH enable option Gavin Shan
@ 2013-06-15  9:03 ` Gavin Shan
  2013-06-15  9:03 ` [PATCH 16/27] powerpc/eeh: I/O chip PE reset Gavin Shan
                   ` (11 subsequent siblings)
  26 siblings, 0 replies; 34+ messages in thread
From: Gavin Shan @ 2013-06-15  9:03 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Gavin Shan

The patch adds I/O chip backend to retrieve the state for the
indicated PE. While the PE state is temperarily unavailable,
the upper layer (powernv platform) should return default delay
(1 second).

Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/powernv/eeh-ioda.c |   99 ++++++++++++++++++++++++++++-
 1 files changed, 98 insertions(+), 1 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/eeh-ioda.c b/arch/powerpc/platforms/powernv/eeh-ioda.c
index b77e90e..7105a4e 100644
--- a/arch/powerpc/platforms/powernv/eeh-ioda.c
+++ b/arch/powerpc/platforms/powernv/eeh-ioda.c
@@ -116,10 +116,107 @@ static int ioda_eeh_set_option(struct eeh_pe *pe, int option)
 	return ret;
 }
 
+/**
+ * ioda_eeh_get_state - Retrieve the state of PE
+ * @pe: EEH PE
+ *
+ * The PE's state should be retrieved from the PEEV, PEST
+ * IODA tables. Since the OPAL has exported the function
+ * to do it, it'd better to use that.
+ */
+static int ioda_eeh_get_state(struct eeh_pe *pe)
+{
+	s64 ret = 0;
+	u8 fstate;
+	u16 pcierr;
+	u32 pe_no;
+	int result;
+	struct pci_controller *hose = pe->phb;
+	struct pnv_phb *phb = hose->private_data;
+
+	/*
+	 * Sanity check on PE address. The PHB PE address should
+	 * be zero.
+	 */
+	if (pe->addr < 0 || pe->addr >= phb->ioda.total_pe) {
+		pr_err("%s: PE address %x out of range [0, %x] "
+		       "on PHB#%x\n",
+			__func__, pe->addr, phb->ioda.total_pe,
+			hose->global_number);
+		return EEH_STATE_NOT_SUPPORT;
+	}
+
+	/* Retrieve PE status through OPAL */
+	pe_no = pe->addr;
+	ret = opal_pci_eeh_freeze_status(phb->opal_id, pe_no,
+			&fstate, &pcierr, NULL);
+	if (ret) {
+		pr_err("%s: Failed to get EEH status on "
+		       "PHB#%x-PE#%x\n, err=%lld\n",
+			__func__, hose->global_number, pe_no, ret);
+		return EEH_STATE_NOT_SUPPORT;
+	}
+
+	/* Check PHB status */
+	if (pe->type & EEH_PE_PHB) {
+		result = 0;
+		result &= ~EEH_STATE_RESET_ACTIVE;
+
+		if (pcierr != OPAL_EEH_PHB_ERROR) {
+			result |= EEH_STATE_MMIO_ACTIVE;
+			result |= EEH_STATE_DMA_ACTIVE;
+			result |= EEH_STATE_MMIO_ENABLED;
+			result |= EEH_STATE_DMA_ENABLED;
+		}
+
+		return result;
+	}
+
+	/* Parse result out */
+	result = 0;
+	switch (fstate) {
+	case OPAL_EEH_STOPPED_NOT_FROZEN:
+		result &= ~EEH_STATE_RESET_ACTIVE;
+		result |= EEH_STATE_MMIO_ACTIVE;
+		result |= EEH_STATE_DMA_ACTIVE;
+		result |= EEH_STATE_MMIO_ENABLED;
+		result |= EEH_STATE_DMA_ENABLED;
+		break;
+	case OPAL_EEH_STOPPED_MMIO_FREEZE:
+		result &= ~EEH_STATE_RESET_ACTIVE;
+		result |= EEH_STATE_DMA_ACTIVE;
+		result |= EEH_STATE_DMA_ENABLED;
+		break;
+	case OPAL_EEH_STOPPED_DMA_FREEZE:
+		result &= ~EEH_STATE_RESET_ACTIVE;
+		result |= EEH_STATE_MMIO_ACTIVE;
+		result |= EEH_STATE_MMIO_ENABLED;
+		break;
+	case OPAL_EEH_STOPPED_MMIO_DMA_FREEZE:
+		result &= ~EEH_STATE_RESET_ACTIVE;
+		break;
+	case OPAL_EEH_STOPPED_RESET:
+		result |= EEH_STATE_RESET_ACTIVE;
+		break;
+	case OPAL_EEH_STOPPED_TEMP_UNAVAIL:
+		result |= EEH_STATE_UNAVAILABLE;
+		break;
+	case OPAL_EEH_STOPPED_PERM_UNAVAIL:
+		result |= EEH_STATE_NOT_SUPPORT;
+		break;
+	default:
+		pr_warning("%s: Unexpected EEH status 0x%x "
+			   "on PHB#%x-PE#%x\n",
+			__func__, fstate, hose->global_number, pe_no);
+	}
+
+	return result;
+}
+
 struct pnv_eeh_ops ioda_eeh_ops = {
 	.post_init		= ioda_eeh_post_init,
 	.set_option		= ioda_eeh_set_option,
-	.get_state		= NULL,
+	.get_state		= ioda_eeh_get_state,
 	.reset			= NULL,
 	.get_log		= NULL,
 	.configure_bridge	= NULL
-- 
1.7.5.4

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 16/27] powerpc/eeh: I/O chip PE reset
  2013-06-15  9:02 [PATCH v4 00/27] EEH Support for PowerNV platform Gavin Shan
                   ` (14 preceding siblings ...)
  2013-06-15  9:03 ` [PATCH 15/27] powerpc/eeh: I/O chip EEH state retrieval Gavin Shan
@ 2013-06-15  9:03 ` Gavin Shan
  2013-06-15  9:03 ` [PATCH 17/27] powerpc/eeh: I/O chip PE log and bridge setup Gavin Shan
                   ` (10 subsequent siblings)
  26 siblings, 0 replies; 34+ messages in thread
From: Gavin Shan @ 2013-06-15  9:03 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Gavin Shan

The patch adds the I/O chip backend to do PE reset. For now, we
focus on PCI bus dependent PE. If PHB PE has been put into error
state, the PHB will take complete reset. Besides, the root bridge
will take fundamental or hot reset accordingly if the indicated
PE locates at the toppest of PCI hierarchy tree. Otherwise, the
upstream p2p bridge will take hot reset.

Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/powernv/eeh-ioda.c |  233 ++++++++++++++++++++++++++++-
 1 files changed, 232 insertions(+), 1 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/eeh-ioda.c b/arch/powerpc/platforms/powernv/eeh-ioda.c
index 7105a4e..f552e23 100644
--- a/arch/powerpc/platforms/powernv/eeh-ioda.c
+++ b/arch/powerpc/platforms/powernv/eeh-ioda.c
@@ -213,11 +213,242 @@ static int ioda_eeh_get_state(struct eeh_pe *pe)
 	return result;
 }
 
+static int ioda_eeh_pe_clear(struct eeh_pe *pe)
+{
+	struct pci_controller *hose;
+	struct pnv_phb *phb;
+	u32 pe_no;
+	u8 fstate;
+	u16 pcierr;
+	s64 ret;
+
+	pe_no = pe->addr;
+	hose = pe->phb;
+	phb = pe->phb->private_data;
+
+	/* Clear the EEH error on the PE */
+	ret = opal_pci_eeh_freeze_clear(phb->opal_id,
+			pe_no, OPAL_EEH_ACTION_CLEAR_FREEZE_ALL);
+	if (ret) {
+		pr_err("%s: Failed to clear EEH error for "
+		       "PHB#%x-PE#%x, err=%lld\n",
+			__func__, hose->global_number, pe_no, ret);
+		return -EIO;
+	}
+
+	/*
+	 * Read the PE state back and verify that the frozen
+	 * state has been removed.
+	 */
+	ret = opal_pci_eeh_freeze_status(phb->opal_id, pe_no,
+			&fstate, &pcierr, NULL);
+	if (ret) {
+		pr_err("%s: Failed to get EEH status on "
+		       "PHB#%x-PE#%x\n, err=%lld\n",
+			__func__, hose->global_number, pe_no, ret);
+		return -EIO;
+	}
+	if (fstate != OPAL_EEH_STOPPED_NOT_FROZEN) {
+		pr_err("%s: Frozen state not cleared on "
+		       "PHB#%x-PE#%x, sts=%x\n",
+			__func__, hose->global_number, pe_no, fstate);
+		return -EIO;
+	}
+
+	return 0;
+}
+
+static s64 ioda_eeh_phb_poll(struct pnv_phb *phb)
+{
+	s64 rc = OPAL_HARDWARE;
+
+	while (1) {
+		rc = opal_pci_poll(phb->opal_id);
+		if (rc <= 0)
+			break;
+
+		msleep(rc);
+	}
+
+	return rc;
+}
+
+static int ioda_eeh_phb_reset(struct pci_controller *hose, int option)
+{
+	struct pnv_phb *phb = hose->private_data;
+	s64 rc = OPAL_HARDWARE;
+
+	pr_debug("%s: Reset PHB#%x, option=%d\n",
+		__func__, hose->global_number, option);
+
+	/* Issue PHB complete reset request */
+	if (option == EEH_RESET_FUNDAMENTAL ||
+	    option == EEH_RESET_HOT)
+		rc = opal_pci_reset(phb->opal_id,
+				OPAL_PHB_COMPLETE,
+				OPAL_ASSERT_RESET);
+	else if (option == EEH_RESET_DEACTIVATE)
+		rc = opal_pci_reset(phb->opal_id,
+				OPAL_PHB_COMPLETE,
+				OPAL_DEASSERT_RESET);
+	if (rc < 0)
+		goto out;
+
+	/*
+	 * Poll state of the PHB until the request is done
+	 * successfully.
+	 */
+	rc = ioda_eeh_phb_poll(phb);
+out:
+	if (rc != OPAL_SUCCESS)
+		return -EIO;
+
+	return 0;
+}
+
+static int ioda_eeh_root_reset(struct pci_controller *hose, int option)
+{
+	struct pnv_phb *phb = hose->private_data;
+	s64 rc = OPAL_SUCCESS;
+
+	pr_debug("%s: Reset PHB#%x, option=%d\n",
+		__func__, hose->global_number, option);
+
+	/*
+	 * During the reset deassert time, we needn't care
+	 * the reset scope because the firmware does nothing
+	 * for fundamental or hot reset during deassert phase.
+	 */
+	if (option == EEH_RESET_FUNDAMENTAL)
+		rc = opal_pci_reset(phb->opal_id,
+				OPAL_PCI_FUNDAMENTAL_RESET,
+				OPAL_ASSERT_RESET);
+	else if (option == EEH_RESET_HOT)
+		rc = opal_pci_reset(phb->opal_id,
+				OPAL_PCI_HOT_RESET,
+				OPAL_ASSERT_RESET);
+	else if (option == EEH_RESET_DEACTIVATE)
+		rc = opal_pci_reset(phb->opal_id,
+				OPAL_PCI_HOT_RESET,
+				OPAL_DEASSERT_RESET);
+	if (rc < 0)
+		goto out;
+
+	/* Poll state of the PHB until the request is done */
+	rc = ioda_eeh_phb_poll(phb);
+out:
+	if (rc != OPAL_SUCCESS)
+		return -EIO;
+
+	return 0;
+}
+
+static int ioda_eeh_bridge_reset(struct pci_controller *hose,
+		struct pci_dev *dev, int option)
+{
+	u16 ctrl;
+
+	pr_debug("%s: Reset device %04x:%02x:%02x.%01x with option %d\n",
+		__func__, hose->global_number, dev->bus->number,
+		PCI_SLOT(dev->devfn), PCI_FUNC(dev->devfn), option);
+
+	switch (option) {
+	case EEH_RESET_FUNDAMENTAL:
+	case EEH_RESET_HOT:
+		pci_read_config_word(dev, PCI_BRIDGE_CONTROL, &ctrl);
+		ctrl |= PCI_BRIDGE_CTL_BUS_RESET;
+		pci_write_config_word(dev, PCI_BRIDGE_CONTROL, ctrl);
+		break;
+	case EEH_RESET_DEACTIVATE:
+		pci_read_config_word(dev, PCI_BRIDGE_CONTROL, &ctrl);
+		ctrl &= ~PCI_BRIDGE_CTL_BUS_RESET;
+		pci_write_config_word(dev, PCI_BRIDGE_CONTROL, ctrl);
+		break;
+	}
+
+	return 0;
+}
+
+/**
+ * ioda_eeh_reset - Reset the indicated PE
+ * @pe: EEH PE
+ * @option: reset option
+ *
+ * Do reset on the indicated PE. For PCI bus sensitive PE,
+ * we need to reset the parent p2p bridge. The PHB has to
+ * be reinitialized if the p2p bridge is root bridge. For
+ * PCI device sensitive PE, we will try to reset the device
+ * through FLR. For now, we don't have OPAL APIs to do HARD
+ * reset yet, so all reset would be SOFT (HOT) reset.
+ */
+static int ioda_eeh_reset(struct eeh_pe *pe, int option)
+{
+	struct pci_controller *hose = pe->phb;
+	struct eeh_dev *edev;
+	struct pci_dev *dev;
+	int ret;
+
+	/*
+	 * Anyway, we have to clear the problematic state for the
+	 * corresponding PE. However, we needn't do it if the PE
+	 * is PHB associated. That means the PHB is having fatal
+	 * errors and it needs reset. Further more, the AIB interface
+	 * isn't reliable any more.
+	 */
+	if (!(pe->type & EEH_PE_PHB) &&
+	    (option == EEH_RESET_HOT ||
+	    option == EEH_RESET_FUNDAMENTAL)) {
+		ret = ioda_eeh_pe_clear(pe);
+		if (ret)
+			return -EIO;
+	}
+
+	/*
+	 * The rules applied to reset, either fundamental or hot reset:
+	 *
+	 * We always reset the direct upstream bridge of the PE. If the
+	 * direct upstream bridge isn't root bridge, we always take hot
+	 * reset no matter what option (fundamental or hot) is. Otherwise,
+	 * we should do the reset according to the required option.
+	 */
+	if (pe->type & EEH_PE_PHB) {
+		ret = ioda_eeh_phb_reset(hose, option);
+	} else {
+		if (pe->type & EEH_PE_DEVICE) {
+			/*
+			 * If it's device PE, we didn't refer to the parent
+			 * PCI bus yet. So we have to figure it out indirectly.
+			 */
+			edev = list_first_entry(&pe->edevs,
+					struct eeh_dev, list);
+			dev = eeh_dev_to_pci_dev(edev);
+			dev = dev->bus->self;
+		} else {
+			/*
+			 * If it's bus PE, the parent PCI bus is already there
+			 * and just pick it up.
+			 */
+			dev = pe->bus->self;
+		}
+
+		/*
+		 * Do reset based on the fact that the direct upstream bridge
+		 * is root bridge (port) or not.
+		 */
+		if (dev->bus->number == 0)
+			ret = ioda_eeh_root_reset(hose, option);
+		else
+			ret = ioda_eeh_bridge_reset(hose, dev, option);
+	}
+
+	return ret;
+}
+
 struct pnv_eeh_ops ioda_eeh_ops = {
 	.post_init		= ioda_eeh_post_init,
 	.set_option		= ioda_eeh_set_option,
 	.get_state		= ioda_eeh_get_state,
-	.reset			= NULL,
+	.reset			= ioda_eeh_reset,
 	.get_log		= NULL,
 	.configure_bridge	= NULL
 };
-- 
1.7.5.4

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 17/27] powerpc/eeh: I/O chip PE log and bridge setup
  2013-06-15  9:02 [PATCH v4 00/27] EEH Support for PowerNV platform Gavin Shan
                   ` (15 preceding siblings ...)
  2013-06-15  9:03 ` [PATCH 16/27] powerpc/eeh: I/O chip PE reset Gavin Shan
@ 2013-06-15  9:03 ` Gavin Shan
  2013-06-15  9:03 ` [PATCH 18/27] powerpc/eeh: PowerNV EEH backends Gavin Shan
                   ` (9 subsequent siblings)
  26 siblings, 0 replies; 34+ messages in thread
From: Gavin Shan @ 2013-06-15  9:03 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Gavin Shan

The patch adds backends to retrieve error log and configure p2p
bridges for the indicated PE.

Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/powernv/eeh-ioda.c |   57 ++++++++++++++++++++++++++++-
 1 files changed, 55 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/eeh-ioda.c b/arch/powerpc/platforms/powernv/eeh-ioda.c
index f552e23..95f7d96 100644
--- a/arch/powerpc/platforms/powernv/eeh-ioda.c
+++ b/arch/powerpc/platforms/powernv/eeh-ioda.c
@@ -444,11 +444,64 @@ static int ioda_eeh_reset(struct eeh_pe *pe, int option)
 	return ret;
 }
 
+/**
+ * ioda_eeh_get_log - Retrieve error log
+ * @pe: EEH PE
+ * @severity: Severity level of the log
+ * @drv_log: buffer to store the log
+ * @len: space of the log buffer
+ *
+ * The function is used to retrieve error log from P7IOC.
+ */
+static int ioda_eeh_get_log(struct eeh_pe *pe, int severity,
+		char *drv_log, unsigned long len)
+{
+	s64 ret;
+	unsigned long flags;
+	struct pci_controller *hose = pe->phb;
+	struct pnv_phb *phb = hose->private_data;
+
+	spin_lock_irqsave(&phb->lock, flags);
+
+	ret = opal_pci_get_phb_diag_data2(phb->opal_id,
+			phb->diag.blob, PNV_PCI_DIAG_BUF_SIZE);
+	if (ret) {
+		spin_unlock_irqrestore(&phb->lock, flags);
+		pr_warning("%s: Failed to retrieve log for PHB#%x-PE#%x\n",
+			__func__, hose->global_number, pe->addr);
+		return -EIO;
+	}
+
+	/*
+	 * FIXME: We probably need log the error in somewhere.
+	 * Lets make it up in future.
+	 */
+	/* pr_info("%s", phb->diag.blob); */
+
+	spin_unlock_irqrestore(&phb->lock, flags);
+
+	return 0;
+}
+
+/**
+ * ioda_eeh_configure_bridge - Configure the PCI bridges for the indicated PE
+ * @pe: EEH PE
+ *
+ * For particular PE, it might have included PCI bridges. In order
+ * to make the PE work properly, those PCI bridges should be configured
+ * correctly. However, we need do nothing on P7IOC since the reset
+ * function will do everything that should be covered by the function.
+ */
+static int ioda_eeh_configure_bridge(struct eeh_pe *pe)
+{
+	return 0;
+}
+
 struct pnv_eeh_ops ioda_eeh_ops = {
 	.post_init		= ioda_eeh_post_init,
 	.set_option		= ioda_eeh_set_option,
 	.get_state		= ioda_eeh_get_state,
 	.reset			= ioda_eeh_reset,
-	.get_log		= NULL,
-	.configure_bridge	= NULL
+	.get_log		= ioda_eeh_get_log,
+	.configure_bridge	= ioda_eeh_configure_bridge
 };
-- 
1.7.5.4

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 18/27] powerpc/eeh: PowerNV EEH backends
  2013-06-15  9:02 [PATCH v4 00/27] EEH Support for PowerNV platform Gavin Shan
                   ` (16 preceding siblings ...)
  2013-06-15  9:03 ` [PATCH 17/27] powerpc/eeh: I/O chip PE log and bridge setup Gavin Shan
@ 2013-06-15  9:03 ` Gavin Shan
  2013-06-15  9:03 ` [PATCH 19/27] powerpc/eeh: Initialization for PowerNV Gavin Shan
                   ` (8 subsequent siblings)
  26 siblings, 0 replies; 34+ messages in thread
From: Gavin Shan @ 2013-06-15  9:03 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Gavin Shan

The patch adds EEH backends for PowerNV platform. It's notable that
part of those EEH backends call to the I/O chip dependent backends.

Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/powernv/Makefile      |    2 +-
 arch/powerpc/platforms/powernv/eeh-powernv.c |  396 ++++++++++++++++++++++++++
 2 files changed, 397 insertions(+), 1 deletions(-)
 create mode 100644 arch/powerpc/platforms/powernv/eeh-powernv.c

diff --git a/arch/powerpc/platforms/powernv/Makefile b/arch/powerpc/platforms/powernv/Makefile
index 09bd0cb..7fe5951 100644
--- a/arch/powerpc/platforms/powernv/Makefile
+++ b/arch/powerpc/platforms/powernv/Makefile
@@ -3,4 +3,4 @@ obj-y			+= opal-rtc.o opal-nvram.o
 
 obj-$(CONFIG_SMP)	+= smp.o
 obj-$(CONFIG_PCI)	+= pci.o pci-p5ioc2.o pci-ioda.o
-obj-$(CONFIG_EEH)	+= eeh-ioda.o
+obj-$(CONFIG_EEH)	+= eeh-ioda.o eeh-powernv.o
diff --git a/arch/powerpc/platforms/powernv/eeh-powernv.c b/arch/powerpc/platforms/powernv/eeh-powernv.c
new file mode 100644
index 0000000..decb317
--- /dev/null
+++ b/arch/powerpc/platforms/powernv/eeh-powernv.c
@@ -0,0 +1,396 @@
+/*
+ * The file intends to implement the platform dependent EEH operations on
+ * powernv platform. Actually, the powernv was created in order to fully
+ * hypervisor support.
+ *
+ * Copyright Benjamin Herrenschmidt & Gavin Shan, IBM Corporation 2013.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#include <linux/atomic.h>
+#include <linux/delay.h>
+#include <linux/export.h>
+#include <linux/init.h>
+#include <linux/list.h>
+#include <linux/msi.h>
+#include <linux/of.h>
+#include <linux/pci.h>
+#include <linux/proc_fs.h>
+#include <linux/rbtree.h>
+#include <linux/sched.h>
+#include <linux/seq_file.h>
+#include <linux/spinlock.h>
+
+#include <asm/eeh.h>
+#include <asm/eeh_event.h>
+#include <asm/firmware.h>
+#include <asm/io.h>
+#include <asm/iommu.h>
+#include <asm/machdep.h>
+#include <asm/msi_bitmap.h>
+#include <asm/opal.h>
+#include <asm/ppc-pci.h>
+
+#include "powernv.h"
+#include "pci.h"
+
+/**
+ * powernv_eeh_init - EEH platform dependent initialization
+ *
+ * EEH platform dependent initialization on powernv
+ */
+static int powernv_eeh_init(void)
+{
+	/* We require OPALv3 */
+	if (!firmware_has_feature(FW_FEATURE_OPALv3)) {
+		pr_warning("%s: OPALv3 is required !\n", __func__);
+		return -EINVAL;
+	}
+
+	/* Set EEH probe mode */
+	eeh_probe_mode_set(EEH_PROBE_MODE_DEV);
+
+	return 0;
+}
+
+/**
+ * powernv_eeh_post_init - EEH platform dependent post initialization
+ *
+ * EEH platform dependent post initialization on powernv. When
+ * the function is called, the EEH PEs and devices should have
+ * been built. If the I/O cache staff has been built, EEH is
+ * ready to supply service.
+ */
+static int powernv_eeh_post_init(void)
+{
+	struct pci_controller *hose;
+	struct pnv_phb *phb;
+	int ret = 0;
+
+	list_for_each_entry(hose, &hose_list, list_node) {
+		phb = hose->private_data;
+
+		if (phb->eeh_ops && phb->eeh_ops->post_init) {
+			ret = phb->eeh_ops->post_init(hose);
+			if (ret)
+				break;
+		}
+	}
+
+	return ret;
+}
+
+/**
+ * powernv_eeh_dev_probe - Do probe on PCI device
+ * @dev: PCI device
+ * @flag: unused
+ *
+ * When EEH module is installed during system boot, all PCI devices
+ * are checked one by one to see if it supports EEH. The function
+ * is introduced for the purpose. By default, EEH has been enabled
+ * on all PCI devices. That's to say, we only need do necessary
+ * initialization on the corresponding eeh device and create PE
+ * accordingly.
+ *
+ * It's notable that's unsafe to retrieve the EEH device through
+ * the corresponding PCI device. During the PCI device hotplug, which
+ * was possiblly triggered by EEH core, the binding between EEH device
+ * and the PCI device isn't built yet.
+ */
+static int powernv_eeh_dev_probe(struct pci_dev *dev, void *flag)
+{
+	struct pci_controller *hose = pci_bus_to_host(dev->bus);
+	struct pnv_phb *phb = hose->private_data;
+	struct device_node *dn = pci_device_to_OF_node(dev);
+	struct eeh_dev *edev = of_node_to_eeh_dev(dn);
+
+	/*
+	 * When probing the root bridge, which doesn't have any
+	 * subordinate PCI devices. We don't have OF node for
+	 * the root bridge. So it's not reasonable to continue
+	 * the probing.
+	 */
+	if (!dn || !edev)
+		return 0;
+
+	/* Skip for PCI-ISA bridge */
+	if ((dev->class >> 8) == PCI_CLASS_BRIDGE_ISA)
+		return 0;
+
+	/* Initialize eeh device */
+	edev->class_code	= dev->class;
+	edev->mode		= 0;
+	edev->config_addr	= ((dev->bus->number << 8) | dev->devfn);
+	edev->pe_config_addr	= phb->bdfn_to_pe(phb, dev->bus, dev->devfn & 0xff);
+
+	/* Create PE */
+	eeh_add_to_parent_pe(edev);
+
+	/*
+	 * Enable EEH explicitly so that we will do EEH check
+	 * while accessing I/O stuff
+	 *
+	 * FIXME: Enable that for PHB3 later
+	 */
+	if (phb->type == PNV_PHB_IODA1)
+		eeh_subsystem_enabled = 1;
+
+	/* Save memory bars */
+	eeh_save_bars(edev);
+
+	return 0;
+}
+
+/**
+ * powernv_eeh_set_option - Initialize EEH or MMIO/DMA reenable
+ * @pe: EEH PE
+ * @option: operation to be issued
+ *
+ * The function is used to control the EEH functionality globally.
+ * Currently, following options are support according to PAPR:
+ * Enable EEH, Disable EEH, Enable MMIO and Enable DMA
+ */
+static int powernv_eeh_set_option(struct eeh_pe *pe, int option)
+{
+	struct pci_controller *hose = pe->phb;
+	struct pnv_phb *phb = hose->private_data;
+	int ret = -EEXIST;
+
+	/*
+	 * What we need do is pass it down for hardware
+	 * implementation to handle it.
+	 */
+	if (phb->eeh_ops && phb->eeh_ops->set_option)
+		ret = phb->eeh_ops->set_option(pe, option);
+
+	return ret;
+}
+
+/**
+ * powernv_eeh_get_pe_addr - Retrieve PE address
+ * @pe: EEH PE
+ *
+ * Retrieve the PE address according to the given tranditional
+ * PCI BDF (Bus/Device/Function) address.
+ */
+static int powernv_eeh_get_pe_addr(struct eeh_pe *pe)
+{
+	return pe->addr;
+}
+
+/**
+ * powernv_eeh_get_state - Retrieve PE state
+ * @pe: EEH PE
+ * @delay: delay while PE state is temporarily unavailable
+ *
+ * Retrieve the state of the specified PE. For IODA-compitable
+ * platform, it should be retrieved from IODA table. Therefore,
+ * we prefer passing down to hardware implementation to handle
+ * it.
+ */
+static int powernv_eeh_get_state(struct eeh_pe *pe, int *delay)
+{
+	struct pci_controller *hose = pe->phb;
+	struct pnv_phb *phb = hose->private_data;
+	int ret = EEH_STATE_NOT_SUPPORT;
+
+	if (phb->eeh_ops && phb->eeh_ops->get_state) {
+		ret = phb->eeh_ops->get_state(pe);
+
+		/*
+		 * If the PE state is temporarily unavailable,
+		 * to inform the EEH core delay for default
+		 * period (1 second)
+		 */
+		if (delay) {
+			*delay = 0;
+			if (ret & EEH_STATE_UNAVAILABLE)
+				*delay = 1000;
+		}
+	}
+
+	return ret;
+}
+
+/**
+ * powernv_eeh_reset - Reset the specified PE
+ * @pe: EEH PE
+ * @option: reset option
+ *
+ * Reset the specified PE
+ */
+static int powernv_eeh_reset(struct eeh_pe *pe, int option)
+{
+	struct pci_controller *hose = pe->phb;
+	struct pnv_phb *phb = hose->private_data;
+	int ret = -EEXIST;
+
+	if (phb->eeh_ops && phb->eeh_ops->reset)
+		ret = phb->eeh_ops->reset(pe, option);
+
+	return ret;
+}
+
+/**
+ * powernv_eeh_wait_state - Wait for PE state
+ * @pe: EEH PE
+ * @max_wait: maximal period in microsecond
+ *
+ * Wait for the state of associated PE. It might take some time
+ * to retrieve the PE's state.
+ */
+static int powernv_eeh_wait_state(struct eeh_pe *pe, int max_wait)
+{
+	int ret;
+	int mwait;
+
+	while (1) {
+		ret = powernv_eeh_get_state(pe, &mwait);
+
+		/*
+		 * If the PE's state is temporarily unavailable,
+		 * we have to wait for the specified time. Otherwise,
+		 * the PE's state will be returned immediately.
+		 */
+		if (ret != EEH_STATE_UNAVAILABLE)
+			return ret;
+
+		max_wait -= mwait;
+		if (max_wait <= 0) {
+			pr_warning("%s: Timeout getting PE#%x's state (%d)\n",
+				   __func__, pe->addr, max_wait);
+			return EEH_STATE_NOT_SUPPORT;
+		}
+
+		msleep(mwait);
+	}
+
+	return EEH_STATE_NOT_SUPPORT;
+}
+
+/**
+ * powernv_eeh_get_log - Retrieve error log
+ * @pe: EEH PE
+ * @severity: temporary or permanent error log
+ * @drv_log: driver log to be combined with retrieved error log
+ * @len: length of driver log
+ *
+ * Retrieve the temporary or permanent error from the PE.
+ */
+static int powernv_eeh_get_log(struct eeh_pe *pe, int severity,
+			char *drv_log, unsigned long len)
+{
+	struct pci_controller *hose = pe->phb;
+	struct pnv_phb *phb = hose->private_data;
+	int ret = -EEXIST;
+
+	if (phb->eeh_ops && phb->eeh_ops->get_log)
+		ret = phb->eeh_ops->get_log(pe, severity, drv_log, len);
+
+	return ret;
+}
+
+/**
+ * powernv_eeh_configure_bridge - Configure PCI bridges in the indicated PE
+ * @pe: EEH PE
+ *
+ * The function will be called to reconfigure the bridges included
+ * in the specified PE so that the mulfunctional PE would be recovered
+ * again.
+ */
+static int powernv_eeh_configure_bridge(struct eeh_pe *pe)
+{
+	struct pci_controller *hose = pe->phb;
+	struct pnv_phb *phb = hose->private_data;
+	int ret = 0;
+
+	if (phb->eeh_ops && phb->eeh_ops->configure_bridge)
+		ret = phb->eeh_ops->configure_bridge(pe);
+
+	return ret;
+}
+
+/**
+ * powernv_eeh_read_config - Read PCI config space
+ * @dn: device node
+ * @where: PCI address
+ * @size: size to read
+ * @val: return value
+ *
+ * Read config space from the speicifed device
+ */
+static int powernv_eeh_read_config(struct device_node *dn, int where,
+				   int size, u32 *val)
+{
+	struct eeh_dev *edev = of_node_to_eeh_dev(dn);
+	struct pci_dev *dev = eeh_dev_to_pci_dev(edev);
+	struct pci_controller *hose = edev->phb;
+
+	return hose->ops->read(dev->bus, dev->devfn, where, size, val);
+}
+
+/**
+ * powernv_eeh_write_config - Write PCI config space
+ * @dn: device node
+ * @where: PCI address
+ * @size: size to write
+ * @val: value to be written
+ *
+ * Write config space to the specified device
+ */
+static int powernv_eeh_write_config(struct device_node *dn, int where,
+				    int size, u32 val)
+{
+	struct eeh_dev *edev = of_node_to_eeh_dev(dn);
+	struct pci_dev *dev = eeh_dev_to_pci_dev(edev);
+	struct pci_controller *hose = edev->phb;
+
+	hose = pci_bus_to_host(dev->bus);
+
+	return hose->ops->write(dev->bus, dev->devfn, where, size, val);
+}
+
+static struct eeh_ops powernv_eeh_ops = {
+	.name                   = "powernv",
+	.init                   = powernv_eeh_init,
+	.post_init              = powernv_eeh_post_init,
+	.of_probe               = NULL,
+	.dev_probe              = powernv_eeh_dev_probe,
+	.set_option             = powernv_eeh_set_option,
+	.get_pe_addr            = powernv_eeh_get_pe_addr,
+	.get_state              = powernv_eeh_get_state,
+	.reset                  = powernv_eeh_reset,
+	.wait_state             = powernv_eeh_wait_state,
+	.get_log                = powernv_eeh_get_log,
+	.configure_bridge       = powernv_eeh_configure_bridge,
+	.read_config            = powernv_eeh_read_config,
+	.write_config           = powernv_eeh_write_config
+};
+
+/**
+ * eeh_powernv_init - Register platform dependent EEH operations
+ *
+ * EEH initialization on powernv platform. This function should be
+ * called before any EEH related functions.
+ */
+static int __init eeh_powernv_init(void)
+{
+	int ret = -EINVAL;
+
+	if (!machine_is(powernv))
+		return ret;
+
+	ret = eeh_ops_register(&powernv_eeh_ops);
+	if (!ret)
+		pr_info("EEH: PowerNV platform initialized\n");
+	else
+		pr_info("EEH: Failed to initialize PowerNV platform (%d)\n", ret);
+
+	return ret;
+}
+
+early_initcall(eeh_powernv_init);
-- 
1.7.5.4

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 19/27] powerpc/eeh: Initialization for PowerNV
  2013-06-15  9:02 [PATCH v4 00/27] EEH Support for PowerNV platform Gavin Shan
                   ` (17 preceding siblings ...)
  2013-06-15  9:03 ` [PATCH 18/27] powerpc/eeh: PowerNV EEH backends Gavin Shan
@ 2013-06-15  9:03 ` Gavin Shan
  2013-06-15  9:03 ` [PATCH 20/27] powerpc/eeh: Enable EEH check for config access Gavin Shan
                   ` (7 subsequent siblings)
  26 siblings, 0 replies; 34+ messages in thread
From: Gavin Shan @ 2013-06-15  9:03 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Gavin Shan

The patch initializes EEH for PowerNV platform. Because the OPAL
APIs requires HUB ID, we need trace that through struct pnv_phb.

Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/powernv/pci-ioda.c   |   16 +++++++++++++---
 arch/powerpc/platforms/powernv/pci-p5ioc2.c |    6 ++++--
 2 files changed, 17 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index 9c9d15e..48b0940 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -973,6 +973,11 @@ static void pnv_pci_ioda_fixup(void)
 	pnv_pci_ioda_setup_PEs();
 	pnv_pci_ioda_setup_seg();
 	pnv_pci_ioda_setup_DMA();
+
+#ifdef CONFIG_EEH
+	eeh_addr_cache_build();
+	eeh_init();
+#endif
 }
 
 /*
@@ -1049,7 +1054,8 @@ static void pnv_pci_ioda_shutdown(struct pnv_phb *phb)
 		       OPAL_ASSERT_RESET);
 }
 
-void __init pnv_pci_init_ioda_phb(struct device_node *np, int ioda_type)
+void __init pnv_pci_init_ioda_phb(struct device_node *np,
+				  u64 hub_id, int ioda_type)
 {
 	struct pci_controller *hose;
 	static int primary = 1;
@@ -1087,6 +1093,7 @@ void __init pnv_pci_init_ioda_phb(struct device_node *np, int ioda_type)
 	hose->first_busno = 0;
 	hose->last_busno = 0xff;
 	hose->private_data = phb;
+	phb->hub_id = hub_id;
 	phb->opal_id = phb_id;
 	phb->type = ioda_type;
 
@@ -1172,6 +1179,9 @@ void __init pnv_pci_init_ioda_phb(struct device_node *np, int ioda_type)
 		phb->ioda.io_size, phb->ioda.io_segsize);
 
 	phb->hose->ops = &pnv_pci_ops;
+#ifdef CONFIG_EEH
+	phb->eeh_ops = &ioda_eeh_ops;
+#endif
 
 	/* Setup RID -> PE mapping function */
 	phb->bdfn_to_pe = pnv_ioda_bdfn_to_pe;
@@ -1212,7 +1222,7 @@ void __init pnv_pci_init_ioda_phb(struct device_node *np, int ioda_type)
 
 void pnv_pci_init_ioda2_phb(struct device_node *np)
 {
-	pnv_pci_init_ioda_phb(np, PNV_PHB_IODA2);
+	pnv_pci_init_ioda_phb(np, 0, PNV_PHB_IODA2);
 }
 
 void __init pnv_pci_init_ioda_hub(struct device_node *np)
@@ -1235,6 +1245,6 @@ void __init pnv_pci_init_ioda_hub(struct device_node *np)
 	for_each_child_of_node(np, phbn) {
 		/* Look for IODA1 PHBs */
 		if (of_device_is_compatible(phbn, "ibm,ioda-phb"))
-			pnv_pci_init_ioda_phb(phbn, PNV_PHB_IODA1);
+			pnv_pci_init_ioda_phb(phbn, hub_id, PNV_PHB_IODA1);
 	}
 }
diff --git a/arch/powerpc/platforms/powernv/pci-p5ioc2.c b/arch/powerpc/platforms/powernv/pci-p5ioc2.c
index 92b37a0..ae72616 100644
--- a/arch/powerpc/platforms/powernv/pci-p5ioc2.c
+++ b/arch/powerpc/platforms/powernv/pci-p5ioc2.c
@@ -92,7 +92,7 @@ static void pnv_pci_p5ioc2_dma_dev_setup(struct pnv_phb *phb,
 	set_iommu_table_base(&pdev->dev, &phb->p5ioc2.iommu_table);
 }
 
-static void __init pnv_pci_init_p5ioc2_phb(struct device_node *np,
+static void __init pnv_pci_init_p5ioc2_phb(struct device_node *np, u64 hub_id,
 					   void *tce_mem, u64 tce_size)
 {
 	struct pnv_phb *phb;
@@ -133,6 +133,7 @@ static void __init pnv_pci_init_p5ioc2_phb(struct device_node *np,
 	phb->hose->first_busno = 0;
 	phb->hose->last_busno = 0xff;
 	phb->hose->private_data = phb;
+	phb->hub_id = hub_id;
 	phb->opal_id = phb_id;
 	phb->type = PNV_PHB_P5IOC2;
 	phb->model = PNV_PHB_MODEL_P5IOC2;
@@ -226,7 +227,8 @@ void __init pnv_pci_init_p5ioc2_hub(struct device_node *np)
 	for_each_child_of_node(np, phbn) {
 		if (of_device_is_compatible(phbn, "ibm,p5ioc2-pcix") ||
 		    of_device_is_compatible(phbn, "ibm,p5ioc2-pciex")) {
-			pnv_pci_init_p5ioc2_phb(phbn, tce_mem, tce_per_phb);
+			pnv_pci_init_p5ioc2_phb(phbn, hub_id,
+					tce_mem, tce_per_phb);
 			tce_mem += tce_per_phb;
 		}
 	}
-- 
1.7.5.4

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 20/27] powerpc/eeh: Enable EEH check for config access
  2013-06-15  9:02 [PATCH v4 00/27] EEH Support for PowerNV platform Gavin Shan
                   ` (18 preceding siblings ...)
  2013-06-15  9:03 ` [PATCH 19/27] powerpc/eeh: Initialization for PowerNV Gavin Shan
@ 2013-06-15  9:03 ` Gavin Shan
  2013-06-15  9:03 ` [PATCH 21/27] powerpc/eeh: Process interrupts caused by EEH Gavin Shan
                   ` (6 subsequent siblings)
  26 siblings, 0 replies; 34+ messages in thread
From: Gavin Shan @ 2013-06-15  9:03 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Gavin Shan

The patch enables EEH check and let EEH core to process the EEH
errors for PowerNV platform while accessing config space. Originally,
the implementation already had mechanism to check EEH errors and
tried to recover from them. However, we never let EEH core to handle
the EEH errors.

Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/powernv/pci.c |   40 +++++++++++++++++++++++++++++++++-
 1 files changed, 39 insertions(+), 1 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci.c b/arch/powerpc/platforms/powernv/pci.c
index 20af220..6d9a506 100644
--- a/arch/powerpc/platforms/powernv/pci.c
+++ b/arch/powerpc/platforms/powernv/pci.c
@@ -32,6 +32,8 @@
 #include <asm/iommu.h>
 #include <asm/tce.h>
 #include <asm/firmware.h>
+#include <asm/eeh_event.h>
+#include <asm/eeh.h>
 
 #include "powernv.h"
 #include "pci.h"
@@ -259,6 +261,10 @@ static int pnv_pci_read_config(struct pci_bus *bus,
 {
 	struct pci_controller *hose = pci_bus_to_host(bus);
 	struct pnv_phb *phb = hose->private_data;
+#ifdef CONFIG_EEH
+	struct device_node *busdn, *dn;
+	struct eeh_pe *phb_pe = NULL;
+#endif
 	u32 bdfn = (((uint64_t)bus->number) << 8) | devfn;
 	s64 rc;
 
@@ -291,8 +297,34 @@ static int pnv_pci_read_config(struct pci_bus *bus,
 	cfg_dbg("pnv_pci_read_config bus: %x devfn: %x +%x/%x -> %08x\n",
 		bus->number, devfn, where, size, *val);
 
-	/* Check if the PHB got frozen due to an error (no response) */
+	/*
+	 * Check if the specified PE has been put into frozen
+	 * state. On the other hand, we needn't do that while
+	 * the PHB has been put into frozen state because of
+	 * PHB-fatal errors.
+	 */
+#ifdef CONFIG_EEH
+	phb_pe = eeh_phb_pe_get(hose);
+	if (phb_pe && (phb_pe->state & EEH_PE_ISOLATED))
+		return PCIBIOS_SUCCESSFUL;
+
+	if (phb->eeh_enabled) {
+		if (*val == EEH_IO_ERROR_VALUE(size)) {
+			busdn = pci_bus_to_OF_node(bus);
+			for (dn = busdn->child; dn; dn = dn->sibling) {
+				struct pci_dn *pdn = PCI_DN(dn);
+
+				if (pdn && pdn->devfn == devfn &&
+				    eeh_dev_check_failure(of_node_to_eeh_dev(dn)))
+					return PCIBIOS_DEVICE_NOT_FOUND;
+			}
+		}
+	} else {
+		pnv_pci_config_check_eeh(phb, bus, bdfn);
+	}
+#else
 	pnv_pci_config_check_eeh(phb, bus, bdfn);
+#endif
 
 	return PCIBIOS_SUCCESSFUL;
 }
@@ -323,8 +355,14 @@ static int pnv_pci_write_config(struct pci_bus *bus,
 	default:
 		return PCIBIOS_FUNC_NOT_SUPPORTED;
 	}
+
 	/* Check if the PHB got frozen due to an error (no response) */
+#ifdef CONFIG_EEH
+	if (!phb->eeh_enabled)
+		pnv_pci_config_check_eeh(phb, bus, bdfn);
+#else
 	pnv_pci_config_check_eeh(phb, bus, bdfn);
+#endif
 
 	return PCIBIOS_SUCCESSFUL;
 }
-- 
1.7.5.4

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 21/27] powerpc/eeh: Process interrupts caused by EEH
  2013-06-15  9:02 [PATCH v4 00/27] EEH Support for PowerNV platform Gavin Shan
                   ` (19 preceding siblings ...)
  2013-06-15  9:03 ` [PATCH 20/27] powerpc/eeh: Enable EEH check for config access Gavin Shan
@ 2013-06-15  9:03 ` Gavin Shan
  2013-06-16  5:12   ` Benjamin Herrenschmidt
  2013-06-15  9:03 ` [PATCH 22/27] powerpc/eeh: Allow to check fenced PHB proactively Gavin Shan
                   ` (5 subsequent siblings)
  26 siblings, 1 reply; 34+ messages in thread
From: Gavin Shan @ 2013-06-15  9:03 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Gavin Shan

On PowerNV platform, the EEH event is produced either by detect
on accessing config or I/O registers, or by interrupts dedicated
for EEH report. The patch adds support to process the interrupts
dedicated for EEH report.

Firstly, the kernel thread will be waken up to process incoming
interrupt. The PHBs will be scanned one by one to process all
existing EEH errors. Besides, There're mulple EEH errors that can
be reported from interrupts and we have differentiated actions
against them:

- If the IOC is dead, all PCI buses under all PHBs will be removed
  from the system.
- If the PHB is dead, all PCI buses under the PHB will be removed
  from the system.
- If the PHB is fenced, EEH event will be sent to EEH core and
  the fenced PHB is expected to be resetted completely.
- If specific PE has been put into frozen state, EEH event will
  be sent to EEH core so that the PE will be resetted.
- If the error is informational one, we just output the related
  registers for debugging purpose and no more action will be
  taken.

Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/eeh.h           |    1 +
 arch/powerpc/kernel/eeh_driver.c         |   10 +
 arch/powerpc/platforms/powernv/Makefile  |    2 +-
 arch/powerpc/platforms/powernv/pci-err.c |  519 ++++++++++++++++++++++++++++++
 arch/powerpc/platforms/powernv/pci.h     |    1 +
 5 files changed, 532 insertions(+), 1 deletions(-)
 create mode 100644 arch/powerpc/platforms/powernv/pci-err.c

diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
index 7ebf522..b52d8d7 100644
--- a/arch/powerpc/include/asm/eeh.h
+++ b/arch/powerpc/include/asm/eeh.h
@@ -52,6 +52,7 @@ struct device_node;
 
 #define EEH_PE_ISOLATED		(1 << 0)	/* Isolated PE		*/
 #define EEH_PE_RECOVERING	(1 << 1)	/* Recovering PE	*/
+#define EEH_PE_PHB_DEAD		(1 << 2)	/* Dead PHB		*/
 
 struct eeh_pe {
 	int type;			/* PE type: PHB/Bus/Device	*/
diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c
index 0acc5a2..c7e13b0 100644
--- a/arch/powerpc/kernel/eeh_driver.c
+++ b/arch/powerpc/kernel/eeh_driver.c
@@ -439,6 +439,15 @@ void eeh_handle_event(struct eeh_pe *pe)
 	 */
 	eeh_pe_dev_traverse(pe, eeh_report_error, &result);
 
+	/*
+	 * On PowerNV platform, the PHB might have been dead. We need
+	 * remove all subordinate PCI buses under the dead PHB.
+	 */
+	if (eeh_probe_mode_dev() &&
+	    (pe->type & EEH_PE_PHB) &&
+	    (pe->state & EEH_PE_PHB_DEAD))
+		goto remove_bus;
+
 	/* Get the current PCI slot state. This can take a long time,
 	 * sometimes over 3 seconds for certain systems.
 	 */
@@ -542,6 +551,7 @@ hard_fail:
 perm_error:
 	eeh_slot_error_detail(pe, EEH_LOG_PERM);
 
+remove_bus:
 	/* Notify all devices that they're about to go down. */
 	eeh_pe_dev_traverse(pe, eeh_report_failure, NULL);
 
diff --git a/arch/powerpc/platforms/powernv/Makefile b/arch/powerpc/platforms/powernv/Makefile
index 7fe5951..912fa7c 100644
--- a/arch/powerpc/platforms/powernv/Makefile
+++ b/arch/powerpc/platforms/powernv/Makefile
@@ -3,4 +3,4 @@ obj-y			+= opal-rtc.o opal-nvram.o
 
 obj-$(CONFIG_SMP)	+= smp.o
 obj-$(CONFIG_PCI)	+= pci.o pci-p5ioc2.o pci-ioda.o
-obj-$(CONFIG_EEH)	+= eeh-ioda.o eeh-powernv.o
+obj-$(CONFIG_EEH)	+= pci-err.o eeh-ioda.o eeh-powernv.o
diff --git a/arch/powerpc/platforms/powernv/pci-err.c b/arch/powerpc/platforms/powernv/pci-err.c
new file mode 100644
index 0000000..e54135b
--- /dev/null
+++ b/arch/powerpc/platforms/powernv/pci-err.c
@@ -0,0 +1,519 @@
+/*
+ * The file instends to handle those interrupts dedicated for error
+ * detection from IOC chips. Currently, we only support P7IOC and
+ * need support more IOC chips in the future. The interrupts have
+ * been exported to hypervisor through "opal-interrupts" of "ibm,opal"
+ * OF node. When one of them comes in, the hypervisor simply turns
+ * to the firmware and expects the appropriate events returned. In
+ * turn, we will format one message and queue that in order to process
+ * it at later point.
+ *
+ * On the other hand, we need maintain information about the states
+ * of IO HUBs and their associated PHBs. The information would be
+ * shared by hypervisor and guests in future. While hypervisor or guests
+ * accessing IO HUBs, PHBs and PEs, the state should be checked and
+ * return approriate results. That would benefit EEH RTAS emulation in
+ * hypervisor as well.
+ *
+ * Copyright Benjamin Herrenschmidt & Gavin Shan, IBM Corporation 2013.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#include <linux/kernel.h>
+#include <linux/pci.h>
+#include <linux/delay.h>
+#include <linux/string.h>
+#include <linux/semaphore.h>
+#include <linux/init.h>
+#include <linux/bootmem.h>
+#include <linux/irq.h>
+#include <linux/io.h>
+#include <linux/kthread.h>
+#include <linux/msi.h>
+
+#include <asm/firmware.h>
+#include <asm/sections.h>
+#include <asm/io.h>
+#include <asm/prom.h>
+#include <asm/pci-bridge.h>
+#include <asm/machdep.h>
+#include <asm/msi_bitmap.h>
+#include <asm/ppc-pci.h>
+#include <asm/opal.h>
+#include <asm/iommu.h>
+#include <asm/tce.h>
+#include <asm/eeh_event.h>
+#include <asm/eeh.h>
+
+#include "powernv.h"
+#include "pci.h"
+
+/* Debugging option */
+#ifdef PCI_ERR_DEBUG_ON
+#define PCI_ERR_DBG(args...)	pr_info(args)
+#else
+#define PCI_ERR_DBG(args...)
+#endif
+
+static struct task_struct *pci_err_thread;
+static struct semaphore pci_err_int_sem;
+static char *pci_err_diag;
+
+static int pci_err_dead_phb(struct pci_controller *hose)
+{
+	struct pnv_phb *phb = hose->private_data;
+	struct eeh_pe *phb_pe;
+	unsigned long flags;
+
+	if (phb->removed)
+		return 0;
+
+	/* Find the PHB PE */
+	phb_pe = eeh_phb_pe_get(hose);
+	if (!phb_pe) {
+		pr_debug("%s Can't find PE for PHB#%d\n",
+			 __func__, hose->global_number);
+		return -EEXIST;
+	}
+	PCI_ERR_DBG("PCI_ERR: PHB#%d PE found\n",
+		hose->global_number);
+
+	/*
+	 * Mark the PHB has been dead and the EEH core
+	 * should remove all subordinate PCI buses.
+	 */
+	eeh_serialize_lock(&flags);
+	phb->removed = 1;
+	if (phb_pe->state & EEH_PE_PHB_DEAD) {
+		eeh_serialize_unlock(flags);
+		return 0;
+	}
+
+	PCI_ERR_DBG("PCI_ERR: Mark PHB#%x dead and send event "
+		    "to EEH core\n", hose->global_number);
+	eeh_pe_state_mark(phb_pe, EEH_PE_PHB_DEAD);
+	eeh_serialize_unlock(flags);
+	eeh_send_failure_event(phb_pe);
+
+	return 0;
+}
+
+static int pci_err_dead_ioc(void)
+{
+	struct pci_controller *hose, *tmp;
+
+	list_for_each_entry_safe(hose, tmp, &hose_list, list_node)
+		pci_err_dead_phb(hose);
+
+	return 0;
+}
+
+/*
+ * When we get global interrupts (e.g. P7IOC RGC), PCI error happens
+ * in critical component of the IOC or PHB. For the formal case, the
+ * firmware just returns OPAL_PCI_ERR_CLASS_HUB and we needn't proceed.
+ * For the late case, we probably need reset one particular PHB. For
+ * that, we're doing is to send EEH event to the toppset PE of that
+ * problematic PHB so that the PHB can be reset by the EEH core.
+ */
+static int pci_err_check_phb(struct pci_controller *hose)
+{
+	struct eeh_pe *phb_pe;
+	unsigned long flags;
+
+	/* Find the PHB PE */
+	phb_pe = eeh_phb_pe_get(hose);
+	if (!phb_pe) {
+		pr_debug("%s Can't find PE for PHB#%d\n",
+			__func__, hose->global_number);
+		return -EEXIST;
+	}
+	PCI_ERR_DBG("PCI_ERR: PHB#%d PE found\n",
+		hose->global_number);
+
+	/* Send event if possible */
+	eeh_serialize_lock(&flags);
+	if (phb_pe->state & EEH_PE_ISOLATED) {
+		eeh_serialize_unlock(flags);
+		return 0;
+	}
+
+	PCI_ERR_DBG("PCI_ERR: Fence PHB#%x and send event "
+		    "to EEH core\n", hose->global_number);
+	eeh_pe_state_mark(phb_pe, EEH_PE_ISOLATED);
+	eeh_serialize_unlock(flags);
+
+	WARN(1, "EEH: PHB failure detected\n");
+	eeh_send_failure_event(phb_pe);
+
+	return 0;
+}
+
+/*
+ * When we get interrupts from PHB, there are probablly some PEs that
+ * have been put into frozen state. What we need do is sent one message
+ * to the EEH device, no matter which one it is, so that the EEH core
+ * can check it out and do PE reset accordingly.
+ */
+static int pci_err_check_pe(struct pci_controller *hose, u16 pe_no)
+{
+	struct eeh_pe *phb_pe, *pe;
+	struct eeh_dev dev, *edev;
+
+	/* Find the PHB PE */
+	phb_pe = eeh_phb_pe_get(hose);
+	if (!phb_pe) {
+		pr_warning("%s Can't find PE for PHB#%d\n",
+			__func__, hose->global_number);
+		return -EEXIST;
+	}
+	PCI_ERR_DBG("PCI_ERR: PHB#%d PE found\n",
+		hose->global_number);
+
+	/*
+	 * If the PHB has been put into fenced state, we
+	 * needn't send the duplicate event because the
+	 * whole PHB is going to take reset.
+	 */
+	if (phb_pe->state & EEH_PE_ISOLATED)
+		return 0;
+
+	/* Find the PE according to PE# */
+	memset(&dev, 0, sizeof(struct eeh_dev));
+	dev.phb = hose;
+	dev.pe_config_addr = pe_no;
+	pe = eeh_pe_get(&dev);
+	if (!pe) {
+		pr_debug("%s: Can't find PE for PHB#%x - PE#%x\n",
+			__func__, hose->global_number, pe_no);
+		return -EEXIST;
+	}
+	PCI_ERR_DBG("PCI_ERR: PE (%x) found for PHB#%x - PE#%x\n",
+		pe->addr, hose->global_number, pe_no);
+
+	/*
+	 * It doesn't matter which EEH device to get
+	 * the message. Just pick up the one on the
+	 * toppest position.
+	 */
+	edev = list_first_entry(&pe->edevs, struct eeh_dev, list);
+	if (!edev) {
+		pr_err("%s: No EEH devices hooked on PHB#%x - PE#%x\n",
+			__func__, hose->global_number, pe_no);
+		return -EEXIST;
+	}
+	PCI_ERR_DBG("PCI_ERR: First EEH device found on PHB#%x - PE#%x\n",
+		hose->global_number, pe_no);
+
+	eeh_dev_check_failure(edev);
+
+	return 0;
+}
+
+static void pci_err_hub_diag_common(struct OpalIoP7IOCErrorData *data)
+{
+	/* GEM */
+	pr_info("  GEM XFIR:        %016llx\n", data->gemXfir);
+	pr_info("  GEM RFIR:        %016llx\n", data->gemRfir);
+	pr_info("  GEM RIRQFIR:     %016llx\n", data->gemRirqfir);
+	pr_info("  GEM Mask:        %016llx\n", data->gemMask);
+	pr_info("  GEM RWOF:        %016llx\n", data->gemRwof);
+
+	/* LEM */
+	pr_info("  LEM FIR:         %016llx\n", data->lemFir);
+	pr_info("  LEM Error Mask:  %016llx\n", data->lemErrMask);
+	pr_info("  LEM Action 0:    %016llx\n", data->lemAction0);
+	pr_info("  LEM Action 1:    %016llx\n", data->lemAction1);
+	pr_info("  LEM WOF:         %016llx\n", data->lemWof);
+}
+
+static void pci_err_hub_diag_data(struct pci_controller *hose)
+{
+	struct pnv_phb *phb = hose->private_data;
+	struct OpalIoP7IOCErrorData *data;
+	long ret;
+
+	data = (struct OpalIoP7IOCErrorData *)pci_err_diag;
+	ret = opal_pci_get_hub_diag_data(phb->hub_id, data, PAGE_SIZE);
+	if (ret != OPAL_SUCCESS) {
+		pr_warning("%s: Failed to get HUB#%llx diag-data, ret=%ld\n",
+			__func__, phb->hub_id, ret);
+		return;
+	}
+
+	/* Check the error type */
+	if (data->type <= OPAL_P7IOC_DIAG_TYPE_NONE ||
+	    data->type >= OPAL_P7IOC_DIAG_TYPE_LAST) {
+		pr_warning("%s: Invalid type of HUB#%llx diag-data (%d)\n",
+			__func__, phb->hub_id, data->type);
+		return;
+	}
+
+	switch (data->type) {
+	case OPAL_P7IOC_DIAG_TYPE_RGC:
+		pr_info("P7IOC diag-data for RGC\n\n");
+		pci_err_hub_diag_common(data);
+		pr_info("  RGC Status:      %016llx\n", data->rgc.rgcStatus);
+		pr_info("  RGC LDCP:        %016llx\n", data->rgc.rgcLdcp);
+		break;
+	case OPAL_P7IOC_DIAG_TYPE_BI:
+		pr_info("P7IOC diag-data for BI %s\n\n",
+			data->bi.biDownbound ? "Downbound" : "Upbound");
+		pci_err_hub_diag_common(data);
+		pr_info("  BI LDCP 0:       %016llx\n", data->bi.biLdcp0);
+		pr_info("  BI LDCP 1:       %016llx\n", data->bi.biLdcp1);
+		pr_info("  BI LDCP 2:       %016llx\n", data->bi.biLdcp2);
+		pr_info("  BI Fence Status: %016llx\n", data->bi.biFenceStatus);
+		break;
+	case OPAL_P7IOC_DIAG_TYPE_CI:
+		pr_info("P7IOC diag-data for CI Port %d\\nn",
+			data->ci.ciPort);
+		pci_err_hub_diag_common(data);
+		pr_info("  CI Port Status:  %016llx\n", data->ci.ciPortStatus);
+		pr_info("  CI Port LDCP:    %016llx\n", data->ci.ciPortLdcp);
+		break;
+	case OPAL_P7IOC_DIAG_TYPE_MISC:
+		pr_info("P7IOC diag-data for MISC\n\n");
+		pci_err_hub_diag_common(data);
+		break;
+	case OPAL_P7IOC_DIAG_TYPE_I2C:
+		pr_info("P7IOC diag-data for I2C\n\n");
+		pci_err_hub_diag_common(data);
+		break;
+	}
+}
+
+static void pci_err_phb_p7ioc_diag(struct pci_controller *hose,
+				   struct OpalIoPhbErrorCommon *common)
+{
+	struct OpalIoP7IOCPhbErrorData *data;
+	int i;
+
+	data = (struct OpalIoP7IOCPhbErrorData *)common;
+
+	pr_info("P7IOC PHB#%x Diag-data (Version: %d)\n\n",
+		hose->global_number, common->version);
+
+	pr_info("  brdgCtl:              %08x\n", data->brdgCtl);
+
+	pr_info("  portStatusReg:        %08x\n", data->portStatusReg);
+	pr_info("  rootCmplxStatus:      %08x\n", data->rootCmplxStatus);
+	pr_info("  busAgentStatus:       %08x\n", data->busAgentStatus);
+
+	pr_info("  deviceStatus:         %08x\n", data->deviceStatus);
+	pr_info("  slotStatus:           %08x\n", data->slotStatus);
+	pr_info("  linkStatus:           %08x\n", data->linkStatus);
+	pr_info("  devCmdStatus:         %08x\n", data->devCmdStatus);
+	pr_info("  devSecStatus:         %08x\n", data->devSecStatus);
+
+	pr_info("  rootErrorStatus:      %08x\n", data->rootErrorStatus);
+	pr_info("  uncorrErrorStatus:    %08x\n", data->uncorrErrorStatus);
+	pr_info("  corrErrorStatus:      %08x\n", data->corrErrorStatus);
+	pr_info("  tlpHdr1:              %08x\n", data->tlpHdr1);
+	pr_info("  tlpHdr2:              %08x\n", data->tlpHdr2);
+	pr_info("  tlpHdr3:              %08x\n", data->tlpHdr3);
+	pr_info("  tlpHdr4:              %08x\n", data->tlpHdr4);
+	pr_info("  sourceId:             %08x\n", data->sourceId);
+
+	pr_info("  errorClass:           %016llx\n", data->errorClass);
+	pr_info("  correlator:           %016llx\n", data->correlator);
+	pr_info("  p7iocPlssr:           %016llx\n", data->p7iocPlssr);
+	pr_info("  p7iocCsr:             %016llx\n", data->p7iocCsr);
+	pr_info("  lemFir:               %016llx\n", data->lemFir);
+	pr_info("  lemErrorMask:         %016llx\n", data->lemErrorMask);
+	pr_info("  lemWOF:               %016llx\n", data->lemWOF);
+	pr_info("  phbErrorStatus:       %016llx\n", data->phbErrorStatus);
+	pr_info("  phbFirstErrorStatus:  %016llx\n", data->phbFirstErrorStatus);
+	pr_info("  phbErrorLog0:         %016llx\n", data->phbErrorLog0);
+	pr_info("  phbErrorLog1:         %016llx\n", data->phbErrorLog1);
+	pr_info("  mmioErrorStatus:      %016llx\n", data->mmioErrorStatus);
+	pr_info("  mmioFirstErrorStatus: %016llx\n", data->mmioFirstErrorStatus);
+	pr_info("  mmioErrorLog0:        %016llx\n", data->mmioErrorLog0);
+	pr_info("  mmioErrorLog1:        %016llx\n", data->mmioErrorLog1);
+	pr_info("  dma0ErrorStatus:      %016llx\n", data->dma0ErrorStatus);
+	pr_info("  dma0FirstErrorStatus: %016llx\n", data->dma0FirstErrorStatus);
+	pr_info("  dma0ErrorLog0:        %016llx\n", data->dma0ErrorLog0);
+	pr_info("  dma0ErrorLog1:        %016llx\n", data->dma0ErrorLog1);
+	pr_info("  dma1ErrorStatus:      %016llx\n", data->dma1ErrorStatus);
+	pr_info("  dma1FirstErrorStatus: %016llx\n", data->dma1FirstErrorStatus);
+	pr_info("  dma1ErrorLog0:        %016llx\n", data->dma1ErrorLog0);
+	pr_info("  dma1ErrorLog1:        %016llx\n", data->dma1ErrorLog1);
+
+	for (i = 0; i < OPAL_P7IOC_NUM_PEST_REGS; i++) {
+		if ((data->pestA[i] >> 63) == 0 &&
+		    (data->pestB[i] >> 63) == 0)
+			continue;
+
+		pr_info("  PE[%3d] PESTA:        %016llx\n", i, data->pestA[i]);
+		pr_info("          PESTB:        %016llx\n", data->pestB[i]);
+	}
+}
+
+static void pci_err_phb_diag_data(struct pci_controller *hose)
+{
+	struct pnv_phb *phb = hose->private_data;
+	struct OpalIoPhbErrorCommon *common;
+	long ret;
+
+	common = (struct OpalIoPhbErrorCommon *)pci_err_diag;
+	ret = opal_pci_get_phb_diag_data2(phb->opal_id, common, PAGE_SIZE);
+	if (ret != OPAL_SUCCESS) {
+		pr_warning("%s: Failed to get diag-data for PHB#%x, ret=%ld\n",
+			   __func__, hose->global_number, ret);
+		return;
+	}
+
+	switch (common->ioType) {
+	case OPAL_PHB_ERROR_DATA_TYPE_P7IOC:
+		pci_err_phb_p7ioc_diag(hose, common);
+		break;
+	default:
+		pr_warning("%s: Unrecognized I/O chip %d\n",
+			   __func__, common->ioType);
+	}
+}
+
+/*
+ * Process PCI errors from IOC, PHB, or PE. Here's the list
+ * of expected error types and their severities, as well as
+ * the corresponding action.
+ *
+ * Type                        Severity                Action
+ * OPAL_EEH_ERROR_IOC  OPAL_EEH_SEV_IOC_DEAD   panic
+ * OPAL_EEH_ERROR_IOC  OPAL_EEH_SEV_INF        diag_data
+ * OPAL_EEH_ERROR_PHB  OPAL_EEH_SEV_PHB_DEAD   panic
+ * OPAL_EEH_ERROR_PHB  OPAL_EEH_SEV_PHB_FENCED eeh
+ * OPAL_EEH_ERROR_PHB  OPAL_EEH_SEV_INF        diag_data
+ * OPAL_EEH_ERROR_PE   OPAL_EEH_SEV_PE_ER      eeh
+ */
+static void pci_err_process(struct pci_controller *hose,
+			u16 err_type, u16 severity, u16 pe_no)
+{
+	struct pnv_phb *phb = hose->private_data;
+
+	PCI_ERR_DBG("PCI_ERR: Process error (%d, %d, %d) on PHB#%x\n",
+		err_type, severity, pe_no, hose->global_number);
+
+	switch (err_type) {
+	case OPAL_EEH_IOC_ERROR:
+		if (severity == OPAL_EEH_SEV_IOC_DEAD) {
+			WARN(1, "EEH: dead IOC detected\n");
+			pci_err_dead_ioc();
+		} else if (severity == OPAL_EEH_SEV_INF)
+			pci_err_hub_diag_data(hose);
+
+		break;
+	case OPAL_EEH_PHB_ERROR:
+		if (severity == OPAL_EEH_SEV_PHB_DEAD) {
+			if (!phb->removed)
+				WARN(1, "EEH: dead PHB#%x detected\n",
+				     hose->global_number);
+			pci_err_dead_phb(hose);
+		} else if (severity == OPAL_EEH_SEV_PHB_FENCED)
+			pci_err_check_phb(hose);
+		else if (severity == OPAL_EEH_SEV_INF)
+			pci_err_phb_diag_data(hose);
+
+		break;
+	case OPAL_EEH_PE_ERROR:
+		pci_err_check_pe(hose, pe_no);
+		break;
+	}
+}
+
+static int pci_err_handler(void *dummy)
+{
+	struct pnv_phb *phb;
+	struct pci_controller *hose, *tmp;
+	u64 frozen_pe_no;
+	u16 err_type, severity;
+	long ret;
+
+	while (!kthread_should_stop()) {
+		down(&pci_err_int_sem);
+		PCI_ERR_DBG("PCI_ERR: Get PCI error semaphore\n");
+
+		list_for_each_entry_safe(hose, tmp, &hose_list, list_node) {
+			/*
+			 * If the subordinate PCI buses of the PHB has been
+			 * removed, we needn't take care of it any more.
+			 */
+			phb = hose->private_data;
+			if (phb->removed)
+				continue;
+
+			ret = opal_pci_next_error(phb->opal_id,
+					&frozen_pe_no, &err_type, &severity);
+
+			/* If OPAL API returns error, we needn't proceed */
+			if (ret != OPAL_SUCCESS) {
+				PCI_ERR_DBG("PCI_ERR: Invalid return value on "
+					    "PHB#%x (0x%lx) from opal_pci_next_error",
+					    hose->global_number, ret);
+				continue;
+			}
+
+			/* If the PHB doesn't have error, stop processing */
+			if (err_type == OPAL_EEH_NO_ERROR ||
+			    severity == OPAL_EEH_SEV_NO_ERROR) {
+				PCI_ERR_DBG("PCI_ERR: No error found on PHB#%x\n",
+					hose->global_number);
+				continue;
+			}
+
+			/*
+			 * Processing the error. We're expecting the error with
+			 * highest priority reported upon multiple errors on the
+			 * specific PHB.
+			 */
+			pci_err_process(hose, err_type, severity, frozen_pe_no);
+		}
+	}
+
+	return 0;
+}
+
+/*
+ * pci_err_init - Initialize PCI error handling component
+ *
+ * It should be done before OPAL interrupts got registered because
+ * that depends on this.
+ */
+static int __init pci_err_init(void)
+{
+	int ret = 0;
+
+	if (!firmware_has_feature(FW_FEATURE_OPALv3)) {
+		pr_err("%s: FW_FEATURE_OPALv3 required!\n",
+			__func__);
+		return -EINVAL;
+	}
+
+	pci_err_diag = (char *)__get_free_page(GFP_KERNEL|__GFP_ZERO);
+	if (!pci_err_diag) {
+		pr_err("%s: Failed to alloc memory for diag data\n",
+			__func__);
+		return -ENOMEM;
+	}
+
+	/* Initialize semaphore */
+	sema_init(&pci_err_int_sem, 0);
+
+	/* Start kthread */
+	pci_err_thread = kthread_run(pci_err_handler, NULL, "PCI_ERR");
+	if (IS_ERR(pci_err_thread)) {
+		ret = PTR_ERR(pci_err_thread);
+		free_page((unsigned long)pci_err_diag);
+		pr_err("%s: Failed to start kthread, ret=%d\n",
+			__func__, ret);
+		return ret;
+	}
+
+	return 0;
+}
+
+arch_initcall(pci_err_init);
diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
index 6f69b87..08d53b0 100644
--- a/arch/powerpc/platforms/powernv/pci.h
+++ b/arch/powerpc/platforms/powernv/pci.h
@@ -92,6 +92,7 @@ struct pnv_phb {
 #ifdef CONFIG_EEH
 	struct pnv_eeh_ops	*eeh_ops;
 	int			eeh_enabled;
+	int			removed;
 #endif
 
 #ifdef CONFIG_PCI_MSI
-- 
1.7.5.4

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 22/27] powerpc/eeh: Allow to check fenced PHB proactively
  2013-06-15  9:02 [PATCH v4 00/27] EEH Support for PowerNV platform Gavin Shan
                   ` (20 preceding siblings ...)
  2013-06-15  9:03 ` [PATCH 21/27] powerpc/eeh: Process interrupts caused by EEH Gavin Shan
@ 2013-06-15  9:03 ` Gavin Shan
  2013-06-15  9:03 ` [PATCH 23/27] powernv/opal: Notifier for OPAL events Gavin Shan
                   ` (4 subsequent siblings)
  26 siblings, 0 replies; 34+ messages in thread
From: Gavin Shan @ 2013-06-15  9:03 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Gavin Shan

It's meaningless to handle frozen PE if we already had fenced PHB.
The patch intends to check the PHB state before checking PE. If the
PHB has been put into fenced state, we need take care of that firstly.

Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
---
 arch/powerpc/kernel/eeh.c |   60 +++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 60 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
index f7cbeae..bfd1c20 100644
--- a/arch/powerpc/kernel/eeh.c
+++ b/arch/powerpc/kernel/eeh.c
@@ -269,6 +269,58 @@ static inline unsigned long eeh_token_to_phys(unsigned long token)
 	return pa | (token & (PAGE_SIZE-1));
 }
 
+/*
+ * On PowerNV platform, we might already have fenced PHB there.
+ * For that case, it's meaningless to recover frozen PE. Intead,
+ * We have to handle fenced PHB firstly.
+ */
+static int eeh_phb_check_failure(struct eeh_pe *pe)
+{
+	struct eeh_pe *phb_pe;
+	unsigned long flags;
+	int ret;
+
+	if (!eeh_probe_mode_dev())
+		return -EPERM;
+
+	/* Find the PHB PE */
+	phb_pe = eeh_phb_pe_get(pe->phb);
+	if (!phb_pe) {
+		pr_warning("%s Can't find PE for PHB#%d\n",
+			   __func__, pe->phb->global_number);
+		return -EEXIST;
+	}
+
+	/* If the PHB has been in problematic state */
+	eeh_serialize_lock(&flags);
+	if (phb_pe->state & (EEH_PE_ISOLATED | EEH_PE_PHB_DEAD)) {
+		ret = 0;
+		goto out;
+	}
+
+	/* Check PHB state */
+	ret = eeh_ops->get_state(phb_pe, NULL);
+	if ((ret < 0) ||
+	    (ret == EEH_STATE_NOT_SUPPORT) ||
+	    (ret & (EEH_STATE_MMIO_ACTIVE | EEH_STATE_DMA_ACTIVE)) ==
+	    (EEH_STATE_MMIO_ACTIVE | EEH_STATE_DMA_ACTIVE)) {
+		ret = 0;
+		goto out;
+	}
+
+	/* Isolate the PHB and send event */
+	eeh_pe_state_mark(phb_pe, EEH_PE_ISOLATED);
+	eeh_serialize_unlock(flags);
+	eeh_send_failure_event(phb_pe);
+
+	WARN(1, "EEH: PHB failure detected\n");
+
+	return 1;
+out:
+	eeh_serialize_unlock(flags);
+	return ret;
+}
+
 /**
  * eeh_dev_check_failure - Check if all 1's data is due to EEH slot freeze
  * @edev: eeh device
@@ -319,6 +371,14 @@ int eeh_dev_check_failure(struct eeh_dev *edev)
 		return 0;
 	}
 
+	/*
+	 * On PowerNV platform, we might already have fenced PHB
+	 * there and we need take care of that firstly.
+	 */
+	ret = eeh_phb_check_failure(pe);
+	if (ret > 0)
+		return ret;
+
 	/* If we already have a pending isolation event for this
 	 * slot, we know it's bad already, we don't need to check.
 	 * Do this checking under a lock; as multiple PCI devices
-- 
1.7.5.4

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 23/27] powernv/opal: Notifier for OPAL events
  2013-06-15  9:02 [PATCH v4 00/27] EEH Support for PowerNV platform Gavin Shan
                   ` (21 preceding siblings ...)
  2013-06-15  9:03 ` [PATCH 22/27] powerpc/eeh: Allow to check fenced PHB proactively Gavin Shan
@ 2013-06-15  9:03 ` Gavin Shan
  2013-06-15  9:03 ` [PATCH 24/27] powernv/opal: Disable OPAL notifier upon poweroff Gavin Shan
                   ` (3 subsequent siblings)
  26 siblings, 0 replies; 34+ messages in thread
From: Gavin Shan @ 2013-06-15  9:03 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Gavin Shan

This patch implements a notifier to receive a notification on OPAL
event mask changes. The notifier is only called as a result of an OPAL
interrupt, which will happen upon reception of FSP messages or PCI errors.
Any event mask change detected as a result of opal_poll_events() will not
result in a notifier call.

[benh: changelog]
Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/opal.h       |    4 ++
 arch/powerpc/platforms/powernv/opal.c |   74 ++++++++++++++++++++++++++++++++-
 2 files changed, 77 insertions(+), 1 deletions(-)

diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
index 2880797..c5803c0 100644
--- a/arch/powerpc/include/asm/opal.h
+++ b/arch/powerpc/include/asm/opal.h
@@ -644,6 +644,10 @@ extern void hvc_opal_init_early(void);
 extern int early_init_dt_scan_opal(unsigned long node, const char *uname,
 				   int depth, void *data);
 
+extern int opal_notifier_register(uint64_t mask, void (*cb)(uint64_t));
+extern void opal_notifier_disable(void);
+extern void opal_notifier_enable(void);
+
 extern int opal_get_chars(uint32_t vtermno, char *buf, int count);
 extern int opal_put_chars(uint32_t vtermno, const char *buf, int total_len);
 
diff --git a/arch/powerpc/platforms/powernv/opal.c b/arch/powerpc/platforms/powernv/opal.c
index 628c564..9e4c9e9 100644
--- a/arch/powerpc/platforms/powernv/opal.c
+++ b/arch/powerpc/platforms/powernv/opal.c
@@ -26,11 +26,21 @@ struct opal {
 	u64 entry;
 } opal;
 
+struct opal_cb {
+	struct list_head list;
+	uint64_t mask;
+	void (*cb)(uint64_t);
+};
+
 static struct device_node *opal_node;
 static DEFINE_SPINLOCK(opal_write_lock);
 extern u64 opal_mc_secondary_handler[];
 static unsigned int *opal_irqs;
 static unsigned int opal_irq_count;
+static LIST_HEAD(opal_notifier);
+static DEFINE_SPINLOCK(opal_notifier_lock);
+static uint64_t last_notified_mask = 0x0ul;
+static atomic_t opal_notifier_hold = ATOMIC_INIT(0);
 
 int __init early_init_dt_scan_opal(unsigned long node,
 				   const char *uname, int depth, void *data)
@@ -95,6 +105,68 @@ static int __init opal_register_exception_handlers(void)
 
 early_initcall(opal_register_exception_handlers);
 
+int opal_notifier_register(uint64_t mask, void (*cb)(uint64_t))
+{
+	unsigned long flags;
+	struct opal_cb *p;
+
+	if (!mask || !cb) {
+		pr_warning("%s: Invalid argument (%llx, %p)!\n",
+			__func__, mask, cb);
+		return -EINVAL;
+	}
+
+	p = kzalloc(sizeof(*p), GFP_KERNEL);
+	if (!p) {
+		pr_warning("%s: Out of memory (%llx, %p)!\n",
+			__func__, mask, cb);
+		return -ENOMEM;
+	}
+	p->mask = mask;
+	p->cb   = cb;
+
+	spin_lock_irqsave(&opal_notifier_lock, flags);
+	list_add_tail(&p->list, &opal_notifier);
+	spin_unlock_irqrestore(&opal_notifier_lock, flags);
+
+	return 0;
+}
+
+static void opal_do_notifier(uint64_t events)
+{
+	struct opal_cb *p;
+	uint64_t changed_mask;
+
+	if (atomic_read(&opal_notifier_hold))
+		return;
+
+	changed_mask = last_notified_mask ^ events;
+	last_notified_mask = events;
+
+	list_for_each_entry(p, &opal_notifier, list) {
+		if (changed_mask & p->mask)
+			p->cb(events);
+	}
+}
+
+void opal_notifier_disable(void)
+{
+	atomic_set(&opal_notifier_hold, 1);
+}
+
+void opal_notifier_enable(void)
+{
+	int64_t rc;
+	uint64_t evt = 0;
+
+	atomic_set(&opal_notifier_hold, 0);
+
+	/* Process pending events */
+	rc = opal_poll_events(&evt);
+	if (rc == OPAL_SUCCESS && evt)
+		opal_do_notifier(evt);
+}
+
 int opal_get_chars(uint32_t vtermno, char *buf, int count)
 {
 	s64 len, rc;
@@ -297,7 +369,7 @@ static irqreturn_t opal_interrupt(int irq, void *data)
 
 	opal_handle_interrupt(virq_to_hw(irq), &events);
 
-	/* XXX TODO: Do something with the events */
+	opal_do_notifier(events);
 
 	return IRQ_HANDLED;
 }
-- 
1.7.5.4

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 24/27] powernv/opal: Disable OPAL notifier upon poweroff
  2013-06-15  9:02 [PATCH v4 00/27] EEH Support for PowerNV platform Gavin Shan
                   ` (22 preceding siblings ...)
  2013-06-15  9:03 ` [PATCH 23/27] powernv/opal: Notifier for OPAL events Gavin Shan
@ 2013-06-15  9:03 ` Gavin Shan
  2013-06-15  9:03 ` [PATCH 25/27] powerpc/eeh: Register OPAL notifier for PCI error Gavin Shan
                   ` (2 subsequent siblings)
  26 siblings, 0 replies; 34+ messages in thread
From: Gavin Shan @ 2013-06-15  9:03 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Gavin Shan

While we're restarting or powering off the system, we needn't
the OPAL notifier any more. So just to disable that.

Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/powernv/setup.c |    4 ++++
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/setup.c b/arch/powerpc/platforms/powernv/setup.c
index d4459bf..84438af 100644
--- a/arch/powerpc/platforms/powernv/setup.c
+++ b/arch/powerpc/platforms/powernv/setup.c
@@ -93,6 +93,8 @@ static void  __noreturn pnv_restart(char *cmd)
 {
 	long rc = OPAL_BUSY;
 
+	opal_notifier_disable();
+
 	while (rc == OPAL_BUSY || rc == OPAL_BUSY_EVENT) {
 		rc = opal_cec_reboot();
 		if (rc == OPAL_BUSY_EVENT)
@@ -108,6 +110,8 @@ static void __noreturn pnv_power_off(void)
 {
 	long rc = OPAL_BUSY;
 
+	opal_notifier_disable();
+
 	while (rc == OPAL_BUSY || rc == OPAL_BUSY_EVENT) {
 		rc = opal_cec_power_down(0);
 		if (rc == OPAL_BUSY_EVENT)
-- 
1.7.5.4

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 25/27] powerpc/eeh: Register OPAL notifier for PCI error
  2013-06-15  9:02 [PATCH v4 00/27] EEH Support for PowerNV platform Gavin Shan
                   ` (23 preceding siblings ...)
  2013-06-15  9:03 ` [PATCH 24/27] powernv/opal: Disable OPAL notifier upon poweroff Gavin Shan
@ 2013-06-15  9:03 ` Gavin Shan
  2013-06-15  9:03 ` [PATCH 26/27] powerpc/powernv: Debugfs directory for PHB Gavin Shan
  2013-06-15  9:03 ` [PATCH 27/27] powerpc/eeh: Debugfs for error injection Gavin Shan
  26 siblings, 0 replies; 34+ messages in thread
From: Gavin Shan @ 2013-06-15  9:03 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Gavin Shan

The patch intends to register OPAL event notifier and process the
PCI errors from firmware. If we have pending PCI errors, the kthread
will be invoked to handle that in turn.

Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/powernv/pci-err.c |   17 +++++++++++++++++
 1 files changed, 17 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-err.c b/arch/powerpc/platforms/powernv/pci-err.c
index e54135b..9b5c4ae 100644
--- a/arch/powerpc/platforms/powernv/pci-err.c
+++ b/arch/powerpc/platforms/powernv/pci-err.c
@@ -425,6 +425,13 @@ static void pci_err_process(struct pci_controller *hose,
 	}
 }
 
+static void pci_err_event(u64 event)
+{
+	/* Notify kthread to process error */
+	if (event & OPAL_EVENT_PCI_ERROR)
+		up(&pci_err_int_sem);
+}
+
 static int pci_err_handler(void *dummy)
 {
 	struct pnv_phb *phb;
@@ -513,6 +520,16 @@ static int __init pci_err_init(void)
 		return ret;
 	}
 
+	/* Register OPAL event notifier */
+	ret = opal_notifier_register(OPAL_EVENT_PCI_ERROR, pci_err_event);
+	if (ret) {
+		kthread_stop(pci_err_thread);
+		free_page((unsigned long)pci_err_diag);
+		pr_err("%s: Failed to register OPAL notifier, rc=%d\n",
+		        __func__, ret);
+		return ret;
+	}
+
 	return 0;
 }
 
-- 
1.7.5.4

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 26/27] powerpc/powernv: Debugfs directory for PHB
  2013-06-15  9:02 [PATCH v4 00/27] EEH Support for PowerNV platform Gavin Shan
                   ` (24 preceding siblings ...)
  2013-06-15  9:03 ` [PATCH 25/27] powerpc/eeh: Register OPAL notifier for PCI error Gavin Shan
@ 2013-06-15  9:03 ` Gavin Shan
  2013-06-15  9:03 ` [PATCH 27/27] powerpc/eeh: Debugfs for error injection Gavin Shan
  26 siblings, 0 replies; 34+ messages in thread
From: Gavin Shan @ 2013-06-15  9:03 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Gavin Shan

The patch creates one debugfs directory ("powerpc/PCIxxxx") for
each PHB so that we can hook EEH error injection debugfs entry
there in proceeding patch.

Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/powernv/pci-ioda.c |   22 ++++++++++++++++++++++
 arch/powerpc/platforms/powernv/pci.h      |    4 ++++
 2 files changed, 26 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index 48b0940..0d9d302 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -13,6 +13,7 @@
 
 #include <linux/kernel.h>
 #include <linux/pci.h>
+#include <linux/debugfs.h>
 #include <linux/delay.h>
 #include <linux/string.h>
 #include <linux/init.h>
@@ -968,12 +969,33 @@ static void pnv_pci_ioda_setup_DMA(void)
 	}
 }
 
+static void pnv_pci_ioda_create_dbgfs(void)
+{
+#ifdef CONFIG_DEBUG_FS
+	struct pci_controller *hose, *tmp;
+	struct pnv_phb *phb;
+	char name[16];
+
+	list_for_each_entry_safe(hose, tmp, &hose_list, list_node) {
+		phb = hose->private_data;
+
+		sprintf(name, "PCI%04x", hose->global_number);
+		phb->dbgfs = debugfs_create_dir(name, powerpc_debugfs_root);
+		if (!phb->dbgfs)
+			pr_warning("%s: Error on creating debugfs on PHB#%x\n",
+				__func__, hose->global_number);
+	}
+#endif /* CONFIG_DEBUG_FS */
+}
+
 static void pnv_pci_ioda_fixup(void)
 {
 	pnv_pci_ioda_setup_PEs();
 	pnv_pci_ioda_setup_seg();
 	pnv_pci_ioda_setup_DMA();
 
+	pnv_pci_ioda_create_dbgfs();
+
 #ifdef CONFIG_EEH
 	eeh_addr_cache_build();
 	eeh_init();
diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
index 08d53b0..d3d67e1 100644
--- a/arch/powerpc/platforms/powernv/pci.h
+++ b/arch/powerpc/platforms/powernv/pci.h
@@ -95,6 +95,10 @@ struct pnv_phb {
 	int			removed;
 #endif
 
+#ifdef CONFIG_DEBUG_FS
+	struct dentry		*dbgfs;
+#endif
+
 #ifdef CONFIG_PCI_MSI
 	unsigned int		msi_base;
 	unsigned int		msi32_support;
-- 
1.7.5.4

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 27/27] powerpc/eeh: Debugfs for error injection
  2013-06-15  9:02 [PATCH v4 00/27] EEH Support for PowerNV platform Gavin Shan
                   ` (25 preceding siblings ...)
  2013-06-15  9:03 ` [PATCH 26/27] powerpc/powernv: Debugfs directory for PHB Gavin Shan
@ 2013-06-15  9:03 ` Gavin Shan
  26 siblings, 0 replies; 34+ messages in thread
From: Gavin Shan @ 2013-06-15  9:03 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Gavin Shan

The patch creates debugfs entries (powerpc/PCIxxxx/err_injct) for
injecting EEH errors for testing purpose.

Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/powernv/eeh-ioda.c |   33 ++++++++++++++++++++++++++++-
 1 files changed, 32 insertions(+), 1 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/eeh-ioda.c b/arch/powerpc/platforms/powernv/eeh-ioda.c
index 95f7d96..ff7a504 100644
--- a/arch/powerpc/platforms/powernv/eeh-ioda.c
+++ b/arch/powerpc/platforms/powernv/eeh-ioda.c
@@ -12,6 +12,7 @@
  */
 
 #include <linux/bootmem.h>
+#include <linux/debugfs.h>
 #include <linux/delay.h>
 #include <linux/init.h>
 #include <linux/io.h>
@@ -34,6 +35,29 @@
 #include "powernv.h"
 #include "pci.h"
 
+#ifdef CONFIG_DEBUG_FS
+static int ioda_eeh_dbgfs_set(void *data, u64 val)
+{
+	struct pci_controller *hose = data;
+	struct pnv_phb *phb = hose->private_data;
+
+	out_be64(phb->regs + 0xD10, val);
+	return 0;
+}
+
+static int ioda_eeh_dbgfs_get(void *data, u64 *val)
+{
+	struct pci_controller *hose = data;
+	struct pnv_phb *phb = hose->private_data;
+
+	*val = in_be64(phb->regs + 0xD10);
+	return 0;
+}
+
+DEFINE_SIMPLE_ATTRIBUTE(ioda_eeh_dbgfs_ops, ioda_eeh_dbgfs_get,
+			ioda_eeh_dbgfs_set, "0x%llx\n");
+#endif /* CONFIG_DEBUG_FS */
+
 /**
  * ioda_eeh_post_init - Chip dependent post initialization
  * @hose: PCI controller
@@ -47,8 +71,15 @@ static int ioda_eeh_post_init(struct pci_controller *hose)
 	struct pnv_phb *phb = hose->private_data;
 
 	/* FIXME: Enable it for PHB3 later */
-	if (phb->type == PNV_PHB_IODA1)
+	if (phb->type == PNV_PHB_IODA1) {
+#ifdef CONFIG_DEBUG_FS
+		if (phb->dbgfs)
+			debugfs_create_file("err_injct", 0600,
+				phb->dbgfs, hose, &ioda_eeh_dbgfs_ops);
+#endif
+
 		phb->eeh_enabled = 1;
+	}
 
 	return 0;
 }
-- 
1.7.5.4

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [PATCH 21/27] powerpc/eeh: Process interrupts caused by EEH
  2013-06-15  9:03 ` [PATCH 21/27] powerpc/eeh: Process interrupts caused by EEH Gavin Shan
@ 2013-06-16  5:12   ` Benjamin Herrenschmidt
  2013-06-16  7:27     ` Gavin Shan
  0 siblings, 1 reply; 34+ messages in thread
From: Benjamin Herrenschmidt @ 2013-06-16  5:12 UTC (permalink / raw)
  To: Gavin Shan; +Cc: linuxppc-dev

On Sat, 2013-06-15 at 17:03 +0800, Gavin Shan wrote:
> On PowerNV platform, the EEH event is produced either by detect
> on accessing config or I/O registers, or by interrupts dedicated
> for EEH report. The patch adds support to process the interrupts
> dedicated for EEH report.
> 
> Firstly, the kernel thread will be waken up to process incoming
> interrupt. The PHBs will be scanned one by one to process all
> existing EEH errors. Besides, There're mulple EEH errors that can
> be reported from interrupts and we have differentiated actions
> against them:
> 
> - If the IOC is dead, all PCI buses under all PHBs will be removed
>   from the system.
> - If the PHB is dead, all PCI buses under the PHB will be removed
>   from the system.
> - If the PHB is fenced, EEH event will be sent to EEH core and
>   the fenced PHB is expected to be resetted completely.
> - If specific PE has been put into frozen state, EEH event will
>   be sent to EEH core so that the PE will be resetted.
> - If the error is informational one, we just output the related
>   registers for debugging purpose and no more action will be
>   taken.

Getting better.... but:

 - I still don't like having a kthread for that. Why not use schedule_work() ?

 - We already have an EEH thread, why not just use it ? IE send it a special
type of message that makes it query the backend for error info instead ?

 - I'm not fan of exposing that EEH private lock. I don't entirely understand
why you need to do that either.

Generally speaking, I'm thinking this file should contain less stuff, most of
it should move into the ioda backend, the interrupt just turning into some
request down to the existing EEH thread.

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 21/27] powerpc/eeh: Process interrupts caused by EEH
  2013-06-16  5:12   ` Benjamin Herrenschmidt
@ 2013-06-16  7:27     ` Gavin Shan
  2013-06-16  8:37       ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 34+ messages in thread
From: Gavin Shan @ 2013-06-16  7:27 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: linuxppc-dev, Gavin Shan

On Sun, Jun 16, 2013 at 03:12:11PM +1000, Benjamin Herrenschmidt wrote:
>On Sat, 2013-06-15 at 17:03 +0800, Gavin Shan wrote:
>> On PowerNV platform, the EEH event is produced either by detect
>> on accessing config or I/O registers, or by interrupts dedicated
>> for EEH report. The patch adds support to process the interrupts
>> dedicated for EEH report.
>> 
>> Firstly, the kernel thread will be waken up to process incoming
>> interrupt. The PHBs will be scanned one by one to process all
>> existing EEH errors. Besides, There're mulple EEH errors that can
>> be reported from interrupts and we have differentiated actions
>> against them:
>> 
>> - If the IOC is dead, all PCI buses under all PHBs will be removed
>>   from the system.
>> - If the PHB is dead, all PCI buses under the PHB will be removed
>>   from the system.
>> - If the PHB is fenced, EEH event will be sent to EEH core and
>>   the fenced PHB is expected to be resetted completely.
>> - If specific PE has been put into frozen state, EEH event will
>>   be sent to EEH core so that the PE will be resetted.
>> - If the error is informational one, we just output the related
>>   registers for debugging purpose and no more action will be
>>   taken.
>

Thanks for the review, Ben.

>Getting better.... but:
>
> - I still don't like having a kthread for that. Why not use schedule_work() ?
>

Ok. Will update it with schedule_work() in next revision :-)

> - We already have an EEH thread, why not just use it ? IE send it a special
>type of message that makes it query the backend for error info instead ?
>

Ok. I'll try to do as you suggested in next revision. Something like:

	- Interrupt comes in
	- OPAL notifier callback
	- Mark all PHB and its subordinate PEs "isolated" since we don't know
	  which PHB/PE has problems (Note: we still need eeh_serialize_lock())
	- Create an EEH event without binding PE to EEH core.
	- EEH core starts new kthread and calls to next_error() backend
	  and handle the EEH errors accordingly.
	  
	  * Informational errors: clear PHB "isolated" state and output diag-data
	    in backend (in eeh-ioda.c as you suggested).
	  * Fenced PHB: PHB complete reset by EEH core and "isolated" state will
	    be cleared during the reset automatically.
	  * Dead PHB: Remove the PHB and its subordinate PCI buses/devices from
		      the system.
	  * Dead IOC: Remove PCI domain from the system.

The problem with the scheme is that the PHB's state can't reflect the real state
any more. For example, PHB#0 has been fenced, but PHB#1 is normal state. We have
to mark all PHBs as "isolated" (fenced) since we don't know which PHB is encountering
problems in the OPAL notifier callback.

I think it would work well. Let me have a try to change the code and make it
workable. The side-effect would be introducing more logic to EEH core and it's
shared by multiple platforms (powernv, pseries, powerkvm guest in future). So
my initial though is making opal_pci_next_error() invisible from EEH core and
make the EEH core totally event-driven :-)

> - I'm not fan of exposing that EEH private lock. I don't entirely understand
>why you need to do that either.
>

It's used to get consistent PE isolated state, which is protected by the lock.
Without it, we would have following case. Since we're going to change the
PE's state in platform code (pci-err.c), we need the lock to protect the PE's
state.

	
		    CPU#0				CPU#1
	PCI-CFG read returns 0xFF's		PCI-CFG read returns 0xFF's
	PE not fenced				PE not fenced
	PE marked as fenced			PE marked as fenced
	EEH event to EEH core			EEH event to EEH core

>Generally speaking, I'm thinking this file should contain less stuff, most of
>it should move into the ioda backend, the interrupt just turning into some
>request down to the existing EEH thread.
>

Yeah, I'll move most of the stuff into eeh-ioda.c with above scheme applied :-)

Thanks,
Gavin

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 21/27] powerpc/eeh: Process interrupts caused by EEH
  2013-06-16  7:27     ` Gavin Shan
@ 2013-06-16  8:37       ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 34+ messages in thread
From: Benjamin Herrenschmidt @ 2013-06-16  8:37 UTC (permalink / raw)
  To: Gavin Shan; +Cc: linuxppc-dev

On Sun, 2013-06-16 at 15:27 +0800, Gavin Shan wrote:

> Thanks for the review, Ben.
> 
> >Getting better.... but:
> >
> > - I still don't like having a kthread for that. Why not use schedule_work() ?
> >
> 
> Ok. Will update it with schedule_work() in next revision :-)
> 
> > - We already have an EEH thread, why not just use it ? IE send it a special
> >type of message that makes it query the backend for error info instead ?
> >
> 
> Ok. I'll try to do as you suggested in next revision. Something like:
> 
> 	- Interrupt comes in
> 	- OPAL notifier callback
> 	- Mark all PHB and its subordinate PEs "isolated" since we don't know
> 	  which PHB/PE has problems (Note: we still need eeh_serialize_lock())

No, don't mark anything. It wouldn't be good to start marking "isolated"
things that aren't. It doesn't matter if we don't "know" they are
isolated just yet. Just "poke" the EEH thread with a different type of
message from the current one.

> 	- Create an EEH event without binding PE to EEH core.
> 	- EEH core starts new kthread and calls to next_error() backend
> 	  and handle the EEH errors accordingly.

No need for a new kthread, EEH core already runs in one, doesn't it ? or
am I missing something here ? And yes, if EEH core doesn't get a PE in
the message, then it can call a backend function along the lines of
"next_error()" which ... returns a PE, and does whatever additional
processing we want to do for PHBs.

That can also return "nothing to do" (INF).
	  
> 	  * Informational errors: clear PHB "isolated" state and output diag-data
> 	    in backend (in eeh-ioda.c as you suggested).

Don't mark isolated in the first place.

> 	  * Fenced PHB: PHB complete reset by EEH core and "isolated" state will
> 	    be cleared during the reset automatically.
> 	  * Dead PHB: Remove the PHB and its subordinate PCI buses/devices from
> 		      the system.
> 	  * Dead IOC: Remove PCI domain from the system.

Might want to have return codes from next_error for doing that from the
core, up to you, not a big deal.

> The problem with the scheme is that the PHB's state can't reflect the real state
> any more. For example, PHB#0 has been fenced, but PHB#1 is normal state. We have
> to mark all PHBs as "isolated" (fenced) since we don't know which PHB is encountering
> problems in the OPAL notifier callback.

No, we don't have to mark anything. The interrupt is an asynchronous
thing anyway, so the effect in practice is that we'll react to it a
little bit later, but it's already coming an undefined amount of time
after the error anyway so we may as well deal with it, and that helps
with the synchronization between detection via the interrupt vs.
detection (of the same error) via the MMIO reads.

> I think it would work well. Let me have a try to change the code and make it
> workable. The side-effect would be introducing more logic to EEH core and it's
> shared by multiple platforms (powernv, pseries, powerkvm guest in future). So
> my initial though is making opal_pci_next_error() invisible from EEH core and
> make the EEH core totally event-driven :-)
> 
> > - I'm not fan of exposing that EEH private lock. I don't entirely understand
> >why you need to do that either.
> >
> 
> It's used to get consistent PE isolated state, which is protected by the lock.
> Without it, we would have following case. Since we're going to change the
> PE's state in platform code (pci-err.c), we need the lock to protect the PE's
> state.

I'd rather you expose get/set_state functions and keep the lock local if
that makes sense but first look at what I've proposed above. It might be
that you are right and the lock must be exposed.

> 	
> 		    CPU#0				CPU#1
> 	PCI-CFG read returns 0xFF's		PCI-CFG read returns 0xFF's
> 	PE not fenced				PE not fenced
> 	PE marked as fenced			PE marked as fenced
> 	EEH event to EEH core			EEH event to EEH core
> 
> >Generally speaking, I'm thinking this file should contain less stuff, most of
> >it should move into the ioda backend, the interrupt just turning into some
> >request down to the existing EEH thread.
> >
> 
> Yeah, I'll move most of the stuff into eeh-ioda.c with above scheme applied :-)

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 01/27] powerpc/eeh: Move common part to kernel directory
  2013-06-15  9:02 ` [PATCH 01/27] powerpc/eeh: Move common part to kernel directory Gavin Shan
@ 2013-06-17  3:03   ` Mike Qiu
  2013-06-18  0:55     ` Gavin Shan
  0 siblings, 1 reply; 34+ messages in thread
From: Mike Qiu @ 2013-06-17  3:03 UTC (permalink / raw)
  To: Gavin Shan; +Cc: linuxppc-dev

于 2013/6/15 17:02, Gavin Shan 写道:
> The patch moves the common part of EEH core into arch/powerpc/kernel
> directory so that we needn't PPC_PSERIES while compiling POWERNV
> platform:
>
> 	* Move the EEH common part into arch/powerpc/kernel
> 	* Move the functions for PCI hotplug from pSeries platform to
> 	  arch/powerpc/kernel/pci_hotplug.c
> 	* Move CONFIG_EEH from arch/powerpc/platforms/pseries/Kconfig to
> 	  arch/powerpc/platforms/Kconfig
> 	* Adjust makefile accordingly
>
> Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
> ---
>   arch/powerpc/kernel/Makefile                |    4 +-
>   arch/powerpc/kernel/eeh.c                   |  942 +++++++++++++++++++++++++++
>   arch/powerpc/kernel/eeh_cache.c             |  319 +++++++++
>   arch/powerpc/kernel/eeh_dev.c               |  112 ++++
>   arch/powerpc/kernel/eeh_driver.c            |  552 ++++++++++++++++
>   arch/powerpc/kernel/eeh_event.c             |  142 ++++
>   arch/powerpc/kernel/eeh_pe.c                |  653 +++++++++++++++++++
>   arch/powerpc/kernel/eeh_sysfs.c             |   75 +++
>   arch/powerpc/kernel/pci_hotplug.c           |  111 ++++
>   arch/powerpc/platforms/Kconfig              |    5 +
>   arch/powerpc/platforms/pseries/Kconfig      |    5 -
>   arch/powerpc/platforms/pseries/Makefile     |    4 +-
>   arch/powerpc/platforms/pseries/eeh.c        |  942 ---------------------------
>   arch/powerpc/platforms/pseries/eeh_cache.c  |  319 ---------
>   arch/powerpc/platforms/pseries/eeh_dev.c    |  112 ----
>   arch/powerpc/platforms/pseries/eeh_driver.c |  552 ----------------
>   arch/powerpc/platforms/pseries/eeh_event.c  |  142 ----
>   arch/powerpc/platforms/pseries/eeh_pe.c     |  653 -------------------
>   arch/powerpc/platforms/pseries/eeh_sysfs.c  |   75 ---
>   arch/powerpc/platforms/pseries/pci_dlpar.c  |   85 ---
>   20 files changed, 2915 insertions(+), 2889 deletions(-)
>   create mode 100644 arch/powerpc/kernel/eeh.c
>   create mode 100644 arch/powerpc/kernel/eeh_cache.c
>   create mode 100644 arch/powerpc/kernel/eeh_dev.c
>   create mode 100644 arch/powerpc/kernel/eeh_driver.c
>   create mode 100644 arch/powerpc/kernel/eeh_event.c
>   create mode 100644 arch/powerpc/kernel/eeh_pe.c
>   create mode 100644 arch/powerpc/kernel/eeh_sysfs.c
>   create mode 100644 arch/powerpc/kernel/pci_hotplug.c
>   delete mode 100644 arch/powerpc/platforms/pseries/eeh.c
>   delete mode 100644 arch/powerpc/platforms/pseries/eeh_cache.c
>   delete mode 100644 arch/powerpc/platforms/pseries/eeh_dev.c
>   delete mode 100644 arch/powerpc/platforms/pseries/eeh_driver.c
>   delete mode 100644 arch/powerpc/platforms/pseries/eeh_event.c
>   delete mode 100644 arch/powerpc/platforms/pseries/eeh_pe.c
>   delete mode 100644 arch/powerpc/platforms/pseries/eeh_sysfs.c
>
> diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
> index f960a79..5826906 100644
> --- a/arch/powerpc/kernel/Makefile
> +++ b/arch/powerpc/kernel/Makefile
> @@ -58,6 +58,8 @@ obj-$(CONFIG_RTAS_PROC)		+= rtas-proc.o
>   obj-$(CONFIG_LPARCFG)		+= lparcfg.o
>   obj-$(CONFIG_IBMVIO)		+= vio.o
>   obj-$(CONFIG_IBMEBUS)           += ibmebus.o
> +obj-$(CONFIG_EEH)		+= eeh.o eeh_pe.o eeh_dev.o eeh_cache.o \
> +				   eeh_driver.o eeh_event.o eeh_sysfs.o
>   obj-$(CONFIG_GENERIC_TBSYNC)	+= smp-tbsync.o
>   obj-$(CONFIG_CRASH_DUMP)	+= crash_dump.o
>   obj-$(CONFIG_FA_DUMP)		+= fadump.o
> @@ -100,7 +102,7 @@ obj-$(CONFIG_PPC_UDBG_16550)	+= legacy_serial.o udbg_16550.o
>   obj-$(CONFIG_STACKTRACE)	+= stacktrace.o
>   obj-$(CONFIG_SWIOTLB)		+= dma-swiotlb.o
>
> -pci64-$(CONFIG_PPC64)		+= pci_dn.o isa-bridge.o
> +pci64-$(CONFIG_PPC64)		+= pci_hotplug.o pci_dn.o isa-bridge.o
>   obj-$(CONFIG_PCI)		+= pci_$(CONFIG_WORD_SIZE).o $(pci64-y) \
>   				   pci-common.o pci_of_scan.o
>   obj-$(CONFIG_PCI_MSI)		+= msi.o
> diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
> new file mode 100644
> index 0000000..6b73d6c
> --- /dev/null
> +++ b/arch/powerpc/kernel/eeh.c
> @@ -0,0 +1,942 @@
> +/*
> + * Copyright IBM Corporation 2001, 2005, 2006
> + * Copyright Dave Engebretsen & Todd Inglett 2001
> + * Copyright Linas Vepstas 2005, 2006
> + * Copyright 2001-2012 IBM Corporation.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write to the Free Software
> + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307 USA
> + *
> + * Please address comments and feedback to Linas Vepstas <linas@austin.ibm.com>
> + */
> +
> +#include <linux/delay.h>
> +#include <linux/sched.h>
> +#include <linux/init.h>
> +#include <linux/list.h>
> +#include <linux/pci.h>
> +#include <linux/proc_fs.h>
> +#include <linux/rbtree.h>
> +#include <linux/seq_file.h>
> +#include <linux/spinlock.h>
> +#include <linux/export.h>
> +#include <linux/of.h>
> +
> +#include <linux/atomic.h>
> +#include <asm/eeh.h>
> +#include <asm/eeh_event.h>
> +#include <asm/io.h>
> +#include <asm/machdep.h>
> +#include <asm/ppc-pci.h>
> +#include <asm/rtas.h>
> +
> +
> +/** Overview:
> + *  EEH, or "Extended Error Handling" is a PCI bridge technology for
> + *  dealing with PCI bus errors that can't be dealt with within the
> + *  usual PCI framework, except by check-stopping the CPU.  Systems
> + *  that are designed for high-availability/reliability cannot afford
> + *  to crash due to a "mere" PCI error, thus the need for EEH.
> + *  An EEH-capable bridge operates by converting a detected error
> + *  into a "slot freeze", taking the PCI adapter off-line, making
> + *  the slot behave, from the OS'es point of view, as if the slot
> + *  were "empty": all reads return 0xff's and all writes are silently
> + *  ignored.  EEH slot isolation events can be triggered by parity
> + *  errors on the address or data busses (e.g. during posted writes),
> + *  which in turn might be caused by low voltage on the bus, dust,
> + *  vibration, humidity, radioactivity or plain-old failed hardware.
> + *
> + *  Note, however, that one of the leading causes of EEH slot
> + *  freeze events are buggy device drivers, buggy device microcode,
> + *  or buggy device hardware.  This is because any attempt by the
> + *  device to bus-master data to a memory address that is not
> + *  assigned to the device will trigger a slot freeze.   (The idea
> + *  is to prevent devices-gone-wild from corrupting system memory).
> + *  Buggy hardware/drivers will have a miserable time co-existing
> + *  with EEH.
> + *
> + *  Ideally, a PCI device driver, when suspecting that an isolation
> + *  event has occurred (e.g. by reading 0xff's), will then ask EEH
> + *  whether this is the case, and then take appropriate steps to
> + *  reset the PCI slot, the PCI device, and then resume operations.
> + *  However, until that day,  the checking is done here, with the
> + *  eeh_check_failure() routine embedded in the MMIO macros.  If
> + *  the slot is found to be isolated, an "EEH Event" is synthesized
> + *  and sent out for processing.
> + */
> +
> +/* If a device driver keeps reading an MMIO register in an interrupt
> + * handler after a slot isolation event, it might be broken.
> + * This sets the threshold for how many read attempts we allow
> + * before printing an error message.
> + */
> +#define EEH_MAX_FAILS	2100000
> +
> +/* Time to wait for a PCI slot to report status, in milliseconds */
> +#define PCI_BUS_RESET_WAIT_MSEC (60*1000)
> +
> +/* Platform dependent EEH operations */
> +struct eeh_ops *eeh_ops = NULL;
> +
> +int eeh_subsystem_enabled;
> +EXPORT_SYMBOL(eeh_subsystem_enabled);
> +
> +/*
> + * EEH probe mode support. The intention is to support multiple
> + * platforms for EEH. Some platforms like pSeries do PCI emunation
> + * based on device tree. However, other platforms like powernv probe
> + * PCI devices from hardware. The flag is used to distinguish that.
> + * In addition, struct eeh_ops::probe would be invoked for particular
> + * OF node or PCI device so that the corresponding PE would be created
> + * there.
> + */
> +int eeh_probe_mode;
> +
> +/* Global EEH mutex */
> +DEFINE_MUTEX(eeh_mutex);
> +
> +/* Lock to avoid races due to multiple reports of an error */
> +static DEFINE_RAW_SPINLOCK(confirm_error_lock);
> +
> +/* Buffer for reporting pci register dumps. Its here in BSS, and
> + * not dynamically alloced, so that it ends up in RMO where RTAS
> + * can access it.
> + */
> +#define EEH_PCI_REGS_LOG_LEN 4096
> +static unsigned char pci_regs_buf[EEH_PCI_REGS_LOG_LEN];
> +
> +/*
> + * The struct is used to maintain the EEH global statistic
> + * information. Besides, the EEH global statistics will be
> + * exported to user space through procfs
> + */
> +struct eeh_stats {
> +	u64 no_device;		/* PCI device not found		*/
> +	u64 no_dn;		/* OF node not found		*/
> +	u64 no_cfg_addr;	/* Config address not found	*/
> +	u64 ignored_check;	/* EEH check skipped		*/
> +	u64 total_mmio_ffs;	/* Total EEH checks		*/
> +	u64 false_positives;	/* Unnecessary EEH checks	*/
> +	u64 slot_resets;	/* PE reset			*/
> +};
> +
> +static struct eeh_stats eeh_stats;
> +
> +#define IS_BRIDGE(class_code) (((class_code)<<16) == PCI_BASE_CLASS_BRIDGE)
> +
> +/**
> + * eeh_gather_pci_data - Copy assorted PCI config space registers to buff
> + * @edev: device to report data for
> + * @buf: point to buffer in which to log
> + * @len: amount of room in buffer
> + *
> + * This routine captures assorted PCI configuration space data,
> + * and puts them into a buffer for RTAS error logging.
> + */
> +static size_t eeh_gather_pci_data(struct eeh_dev *edev, char * buf, size_t len)
> +{
> +	struct device_node *dn = eeh_dev_to_of_node(edev);
> +	struct pci_dev *dev = eeh_dev_to_pci_dev(edev);
> +	u32 cfg;
> +	int cap, i;
> +	int n = 0;
> +
> +	n += scnprintf(buf+n, len-n, "%s\n", dn->full_name);
> +	printk(KERN_WARNING "EEH: of node=%s\n", dn->full_name);
> +
> +	eeh_ops->read_config(dn, PCI_VENDOR_ID, 4, &cfg);
> +	n += scnprintf(buf+n, len-n, "dev/vend:%08x\n", cfg);
> +	printk(KERN_WARNING "EEH: PCI device/vendor: %08x\n", cfg);
> +
> +	eeh_ops->read_config(dn, PCI_COMMAND, 4, &cfg);
> +	n += scnprintf(buf+n, len-n, "cmd/stat:%x\n", cfg);
> +	printk(KERN_WARNING "EEH: PCI cmd/status register: %08x\n", cfg);
> +
> +	if (!dev) {
> +		printk(KERN_WARNING "EEH: no PCI device for this of node\n");
> +		return n;
> +	}
> +
> +	/* Gather bridge-specific registers */
> +	if (dev->class >> 16 == PCI_BASE_CLASS_BRIDGE) {
> +		eeh_ops->read_config(dn, PCI_SEC_STATUS, 2, &cfg);
> +		n += scnprintf(buf+n, len-n, "sec stat:%x\n", cfg);
> +		printk(KERN_WARNING "EEH: Bridge secondary status: %04x\n", cfg);
> +
> +		eeh_ops->read_config(dn, PCI_BRIDGE_CONTROL, 2, &cfg);
> +		n += scnprintf(buf+n, len-n, "brdg ctl:%x\n", cfg);
> +		printk(KERN_WARNING "EEH: Bridge control: %04x\n", cfg);
> +	}
> +
> +	/* Dump out the PCI-X command and status regs */
> +	cap = pci_find_capability(dev, PCI_CAP_ID_PCIX);
BTW, when move common part , here you could use dev->pcie_cap at your 
convenience, and pcie_cap has
been initialized in of_create_pci_dev--->set_pcie_port_type
> +	if (cap) {
> +		eeh_ops->read_config(dn, cap, 4, &cfg);
> +		n += scnprintf(buf+n, len-n, "pcix-cmd:%x\n", cfg);
> +		printk(KERN_WARNING "EEH: PCI-X cmd: %08x\n", cfg);
> +
> +		eeh_ops->read_config(dn, cap+4, 4, &cfg);
> +		n += scnprintf(buf+n, len-n, "pcix-stat:%x\n", cfg);
> +		printk(KERN_WARNING "EEH: PCI-X status: %08x\n", cfg);
> +	}
> +
> +	/* If PCI-E capable, dump PCI-E cap 10, and the AER */
> +	cap = pci_find_capability(dev, PCI_CAP_ID_EXP);
> +	if (cap) {
> +		n += scnprintf(buf+n, len-n, "pci-e cap10:\n");
> +		printk(KERN_WARNING
> +		       "EEH: PCI-E capabilities and status follow:\n");
> +
> +		for (i=0; i<=8; i++) {
> +			eeh_ops->read_config(dn, cap+4*i, 4, &cfg);
> +			n += scnprintf(buf+n, len-n, "%02x:%x\n", 4*i, cfg);
> +			printk(KERN_WARNING "EEH: PCI-E %02x: %08x\n", i, cfg);
> +		}
> +
> +		cap = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_ERR);
> +		if (cap) {
> +			n += scnprintf(buf+n, len-n, "pci-e AER:\n");
> +			printk(KERN_WARNING
> +			       "EEH: PCI-E AER capability register set follows:\n");
> +
> +			for (i=0; i<14; i++) {
> +				eeh_ops->read_config(dn, cap+4*i, 4, &cfg);
> +				n += scnprintf(buf+n, len-n, "%02x:%x\n", 4*i, cfg);
> +				printk(KERN_WARNING "EEH: PCI-E AER %02x: %08x\n", i, cfg);
> +			}
> +		}
> +	}
> +
> +	return n;
> +}
> +
> +/**
> + * eeh_slot_error_detail - Generate combined log including driver log and error log
> + * @pe: EEH PE
> + * @severity: temporary or permanent error log
> + *
> + * This routine should be called to generate the combined log, which
> + * is comprised of driver log and error log. The driver log is figured
> + * out from the config space of the corresponding PCI device, while
> + * the error log is fetched through platform dependent function call.
> + */
> +void eeh_slot_error_detail(struct eeh_pe *pe, int severity)
> +{
> +	size_t loglen = 0;
> +	struct eeh_dev *edev;
> +
> +	eeh_pci_enable(pe, EEH_OPT_THAW_MMIO);
> +	eeh_ops->configure_bridge(pe);
> +	eeh_pe_restore_bars(pe);
> +
> +	pci_regs_buf[0] = 0;
> +	eeh_pe_for_each_dev(pe, edev) {
> +		loglen += eeh_gather_pci_data(edev, pci_regs_buf,
> +				EEH_PCI_REGS_LOG_LEN);
> +        }
> +
> +	eeh_ops->get_log(pe, severity, pci_regs_buf, loglen);
> +}
> +
> +/**
> + * eeh_token_to_phys - Convert EEH address token to phys address
> + * @token: I/O token, should be address in the form 0xA....
> + *
> + * This routine should be called to convert virtual I/O address
> + * to physical one.
> + */
> +static inline unsigned long eeh_token_to_phys(unsigned long token)
> +{
> +	pte_t *ptep;
> +	unsigned long pa;
> +
> +	ptep = find_linux_pte(init_mm.pgd, token);
> +	if (!ptep)
> +		return token;
> +	pa = pte_pfn(*ptep) << PAGE_SHIFT;
> +
> +	return pa | (token & (PAGE_SIZE-1));
> +}
> +
> +/**
> + * eeh_dev_check_failure - Check if all 1's data is due to EEH slot freeze
> + * @edev: eeh device
> + *
> + * Check for an EEH failure for the given device node.  Call this
> + * routine if the result of a read was all 0xff's and you want to
> + * find out if this is due to an EEH slot freeze.  This routine
> + * will query firmware for the EEH status.
> + *
> + * Returns 0 if there has not been an EEH error; otherwise returns
> + * a non-zero value and queues up a slot isolation event notification.
> + *
> + * It is safe to call this routine in an interrupt context.
> + */
> +int eeh_dev_check_failure(struct eeh_dev *edev)
> +{
> +	int ret;
> +	unsigned long flags;
> +	struct device_node *dn;
> +	struct pci_dev *dev;
> +	struct eeh_pe *pe;
> +	int rc = 0;
> +	const char *location;
> +
> +	eeh_stats.total_mmio_ffs++;
> +
> +	if (!eeh_subsystem_enabled)
> +		return 0;
> +
> +	if (!edev) {
> +		eeh_stats.no_dn++;
> +		return 0;
> +	}
> +	dn = eeh_dev_to_of_node(edev);
> +	dev = eeh_dev_to_pci_dev(edev);
> +	pe = edev->pe;
> +
> +	/* Access to IO BARs might get this far and still not want checking. */
> +	if (!pe) {
> +		eeh_stats.ignored_check++;
> +		pr_debug("EEH: Ignored check for %s %s\n",
> +			eeh_pci_name(dev), dn->full_name);
> +		return 0;
> +	}
> +
> +	if (!pe->addr && !pe->config_addr) {
> +		eeh_stats.no_cfg_addr++;
> +		return 0;
> +	}
> +
> +	/* If we already have a pending isolation event for this
> +	 * slot, we know it's bad already, we don't need to check.
> +	 * Do this checking under a lock; as multiple PCI devices
> +	 * in one slot might report errors simultaneously, and we
> +	 * only want one error recovery routine running.
> +	 */
> +	raw_spin_lock_irqsave(&confirm_error_lock, flags);
> +	rc = 1;
> +	if (pe->state & EEH_PE_ISOLATED) {
> +		pe->check_count++;
> +		if (pe->check_count % EEH_MAX_FAILS == 0) {
> +			location = of_get_property(dn, "ibm,loc-code", NULL);
> +			printk(KERN_ERR "EEH: %d reads ignored for recovering device at "
> +				"location=%s driver=%s pci addr=%s\n",
> +				pe->check_count, location,
> +				eeh_driver_name(dev), eeh_pci_name(dev));
> +			printk(KERN_ERR "EEH: Might be infinite loop in %s driver\n",
> +				eeh_driver_name(dev));
> +			dump_stack();
> +		}
> +		goto dn_unlock;
> +	}
> +
> +	/*
> +	 * Now test for an EEH failure.  This is VERY expensive.
> +	 * Note that the eeh_config_addr may be a parent device
> +	 * in the case of a device behind a bridge, or it may be
> +	 * function zero of a multi-function device.
> +	 * In any case they must share a common PHB.
> +	 */
> +	ret = eeh_ops->get_state(pe, NULL);
> +
> +	/* Note that config-io to empty slots may fail;
> +	 * they are empty when they don't have children.
> +	 * We will punt with the following conditions: Failure to get
> +	 * PE's state, EEH not support and Permanently unavailable
> +	 * state, PE is in good state.
> +	 */
> +	if ((ret < 0) ||
> +	    (ret == EEH_STATE_NOT_SUPPORT) ||
> +	    (ret & (EEH_STATE_MMIO_ACTIVE | EEH_STATE_DMA_ACTIVE)) ==
> +	    (EEH_STATE_MMIO_ACTIVE | EEH_STATE_DMA_ACTIVE)) {
> +		eeh_stats.false_positives++;
> +		pe->false_positives++;
> +		rc = 0;
> +		goto dn_unlock;
> +	}
> +
> +	eeh_stats.slot_resets++;
> +
> +	/* Avoid repeated reports of this failure, including problems
> +	 * with other functions on this device, and functions under
> +	 * bridges.
> +	 */
> +	eeh_pe_state_mark(pe, EEH_PE_ISOLATED);
> +	raw_spin_unlock_irqrestore(&confirm_error_lock, flags);
> +
> +	eeh_send_failure_event(pe);
> +
> +	/* Most EEH events are due to device driver bugs.  Having
> +	 * a stack trace will help the device-driver authors figure
> +	 * out what happened.  So print that out.
> +	 */
> +	WARN(1, "EEH: failure detected\n");
> +	return 1;
> +
> +dn_unlock:
> +	raw_spin_unlock_irqrestore(&confirm_error_lock, flags);
> +	return rc;
> +}
> +
> +EXPORT_SYMBOL_GPL(eeh_dev_check_failure);
> +
> +/**
> + * eeh_check_failure - Check if all 1's data is due to EEH slot freeze
> + * @token: I/O token, should be address in the form 0xA....
> + * @val: value, should be all 1's (XXX why do we need this arg??)
> + *
> + * Check for an EEH failure at the given token address.  Call this
> + * routine if the result of a read was all 0xff's and you want to
> + * find out if this is due to an EEH slot freeze event.  This routine
> + * will query firmware for the EEH status.
> + *
> + * Note this routine is safe to call in an interrupt context.
> + */
> +unsigned long eeh_check_failure(const volatile void __iomem *token, unsigned long val)
> +{
> +	unsigned long addr;
> +	struct eeh_dev *edev;
> +
> +	/* Finding the phys addr + pci device; this is pretty quick. */
> +	addr = eeh_token_to_phys((unsigned long __force) token);
> +	edev = eeh_addr_cache_get_dev(addr);
> +	if (!edev) {
> +		eeh_stats.no_device++;
> +		return val;
> +	}
> +
> +	eeh_dev_check_failure(edev);
> +
> +	pci_dev_put(eeh_dev_to_pci_dev(edev));
> +	return val;
> +}
> +
> +EXPORT_SYMBOL(eeh_check_failure);
> +
> +
> +/**
> + * eeh_pci_enable - Enable MMIO or DMA transfers for this slot
> + * @pe: EEH PE
> + *
> + * This routine should be called to reenable frozen MMIO or DMA
> + * so that it would work correctly again. It's useful while doing
> + * recovery or log collection on the indicated device.
> + */
> +int eeh_pci_enable(struct eeh_pe *pe, int function)
> +{
> +	int rc;
> +
> +	rc = eeh_ops->set_option(pe, function);
> +	if (rc)
> +		pr_warning("%s: Unexpected state change %d on PHB#%d-PE#%x, err=%d\n",
> +			__func__, function, pe->phb->global_number, pe->addr, rc);
> +
> +	rc = eeh_ops->wait_state(pe, PCI_BUS_RESET_WAIT_MSEC);
> +	if (rc > 0 && (rc & EEH_STATE_MMIO_ENABLED) &&
> +	   (function == EEH_OPT_THAW_MMIO))
> +		return 0;
> +
> +	return rc;
> +}
> +
> +/**
> + * pcibios_set_pcie_slot_reset - Set PCI-E reset state
> + * @dev: pci device struct
> + * @state: reset state to enter
> + *
> + * Return value:
> + * 	0 if success
> + */
> +int pcibios_set_pcie_reset_state(struct pci_dev *dev, enum pcie_reset_state state)
> +{
> +	struct eeh_dev *edev = pci_dev_to_eeh_dev(dev);
> +	struct eeh_pe *pe = edev->pe;
> +
> +	if (!pe) {
> +		pr_err("%s: No PE found on PCI device %s\n",
> +			__func__, pci_name(dev));
> +		return -EINVAL;
> +	}
> +
> +	switch (state) {
> +	case pcie_deassert_reset:
> +		eeh_ops->reset(pe, EEH_RESET_DEACTIVATE);
> +		break;
> +	case pcie_hot_reset:
> +		eeh_ops->reset(pe, EEH_RESET_HOT);
> +		break;
> +	case pcie_warm_reset:
> +		eeh_ops->reset(pe, EEH_RESET_FUNDAMENTAL);
> +		break;
> +	default:
> +		return -EINVAL;
> +	};
> +
> +	return 0;
> +}
> +
> +/**
> + * eeh_set_pe_freset - Check the required reset for the indicated device
> + * @data: EEH device
> + * @flag: return value
> + *
> + * Each device might have its preferred reset type: fundamental or
> + * hot reset. The routine is used to collected the information for
> + * the indicated device and its children so that the bunch of the
> + * devices could be reset properly.
> + */
> +static void *eeh_set_dev_freset(void *data, void *flag)
> +{
> +	struct pci_dev *dev;
> +	unsigned int *freset = (unsigned int *)flag;
> +	struct eeh_dev *edev = (struct eeh_dev *)data;
> +
> +	dev = eeh_dev_to_pci_dev(edev);
> +	if (dev)
> +		*freset |= dev->needs_freset;
> +
> +	return NULL;
> +}
> +
> +/**
> + * eeh_reset_pe_once - Assert the pci #RST line for 1/4 second
> + * @pe: EEH PE
> + *
> + * Assert the PCI #RST line for 1/4 second.
> + */
> +static void eeh_reset_pe_once(struct eeh_pe *pe)
> +{
> +	unsigned int freset = 0;
> +
> +	/* Determine type of EEH reset required for
> +	 * Partitionable Endpoint, a hot-reset (1)
> +	 * or a fundamental reset (3).
> +	 * A fundamental reset required by any device under
> +	 * Partitionable Endpoint trumps hot-reset.
> +  	 */
> +	eeh_pe_dev_traverse(pe, eeh_set_dev_freset, &freset);
> +
> +	if (freset)
> +		eeh_ops->reset(pe, EEH_RESET_FUNDAMENTAL);
> +	else
> +		eeh_ops->reset(pe, EEH_RESET_HOT);
> +
> +	/* The PCI bus requires that the reset be held high for at least
> +	 * a 100 milliseconds. We wait a bit longer 'just in case'.
> +	 */
> +#define PCI_BUS_RST_HOLD_TIME_MSEC 250
> +	msleep(PCI_BUS_RST_HOLD_TIME_MSEC);
> +	
> +	/* We might get hit with another EEH freeze as soon as the
> +	 * pci slot reset line is dropped. Make sure we don't miss
> +	 * these, and clear the flag now.
> +	 */
> +	eeh_pe_state_clear(pe, EEH_PE_ISOLATED);
> +
> +	eeh_ops->reset(pe, EEH_RESET_DEACTIVATE);
> +
> +	/* After a PCI slot has been reset, the PCI Express spec requires
> +	 * a 1.5 second idle time for the bus to stabilize, before starting
> +	 * up traffic.
> +	 */
> +#define PCI_BUS_SETTLE_TIME_MSEC 1800
> +	msleep(PCI_BUS_SETTLE_TIME_MSEC);
> +}
> +
> +/**
> + * eeh_reset_pe - Reset the indicated PE
> + * @pe: EEH PE
> + *
> + * This routine should be called to reset indicated device, including
> + * PE. A PE might include multiple PCI devices and sometimes PCI bridges
> + * might be involved as well.
> + */
> +int eeh_reset_pe(struct eeh_pe *pe)
> +{
> +	int i, rc;
> +
> +	/* Take three shots at resetting the bus */
> +	for (i=0; i<3; i++) {
> +		eeh_reset_pe_once(pe);
> +
> +		rc = eeh_ops->wait_state(pe, PCI_BUS_RESET_WAIT_MSEC);
> +		if (rc == (EEH_STATE_MMIO_ACTIVE | EEH_STATE_DMA_ACTIVE))
> +			return 0;
> +
> +		if (rc < 0) {
> +			pr_err("%s: Unrecoverable slot failure on PHB#%d-PE#%x",
> +				__func__, pe->phb->global_number, pe->addr);
> +			return -1;
> +		}
> +		pr_err("EEH: bus reset %d failed on PHB#%d-PE#%x, rc=%d\n",
> +			i+1, pe->phb->global_number, pe->addr, rc);
> +	}
> +
> +	return -1;
> +}
> +
> +/**
> + * eeh_save_bars - Save device bars
> + * @edev: PCI device associated EEH device
> + *
> + * Save the values of the device bars. Unlike the restore
> + * routine, this routine is *not* recursive. This is because
> + * PCI devices are added individually; but, for the restore,
> + * an entire slot is reset at a time.
> + */
> +void eeh_save_bars(struct eeh_dev *edev)
> +{
> +	int i;
> +	struct device_node *dn;
> +
> +	if (!edev)
> +		return;
> +	dn = eeh_dev_to_of_node(edev);
> +	
> +	for (i = 0; i < 16; i++)
> +		eeh_ops->read_config(dn, i * 4, 4, &edev->config_space[i]);
> +}
> +
> +/**
> + * eeh_ops_register - Register platform dependent EEH operations
> + * @ops: platform dependent EEH operations
> + *
> + * Register the platform dependent EEH operation callback
> + * functions. The platform should call this function before
> + * any other EEH operations.
> + */
> +int __init eeh_ops_register(struct eeh_ops *ops)
> +{
> +	if (!ops->name) {
> +		pr_warning("%s: Invalid EEH ops name for %p\n",
> +			__func__, ops);
> +		return -EINVAL;
> +	}
> +
> +	if (eeh_ops && eeh_ops != ops) {
> +		pr_warning("%s: EEH ops of platform %s already existing (%s)\n",
> +			__func__, eeh_ops->name, ops->name);
> +		return -EEXIST;
> +	}
> +
> +	eeh_ops = ops;
> +
> +	return 0;
> +}
> +
> +/**
> + * eeh_ops_unregister - Unreigster platform dependent EEH operations
> + * @name: name of EEH platform operations
> + *
> + * Unregister the platform dependent EEH operation callback
> + * functions.
> + */
> +int __exit eeh_ops_unregister(const char *name)
> +{
> +	if (!name || !strlen(name)) {
> +		pr_warning("%s: Invalid EEH ops name\n",
> +			__func__);
> +		return -EINVAL;
> +	}
> +
> +	if (eeh_ops && !strcmp(eeh_ops->name, name)) {
> +		eeh_ops = NULL;
> +		return 0;
> +	}
> +
> +	return -EEXIST;
> +}
> +
> +/**
> + * eeh_init - EEH initialization
> + *
> + * Initialize EEH by trying to enable it for all of the adapters in the system.
> + * As a side effect we can determine here if eeh is supported at all.
> + * Note that we leave EEH on so failed config cycles won't cause a machine
> + * check.  If a user turns off EEH for a particular adapter they are really
> + * telling Linux to ignore errors.  Some hardware (e.g. POWER5) won't
> + * grant access to a slot if EEH isn't enabled, and so we always enable
> + * EEH for all slots/all devices.
> + *
> + * The eeh-force-off option disables EEH checking globally, for all slots.
> + * Even if force-off is set, the EEH hardware is still enabled, so that
> + * newer systems can boot.
> + */
> +static int __init eeh_init(void)
> +{
> +	struct pci_controller *hose, *tmp;
> +	struct device_node *phb;
> +	int ret;
> +
> +	/* call platform initialization function */
> +	if (!eeh_ops) {
> +		pr_warning("%s: Platform EEH operation not found\n",
> +			__func__);
> +		return -EEXIST;
> +	} else if ((ret = eeh_ops->init())) {
> +		pr_warning("%s: Failed to call platform init function (%d)\n",
> +			__func__, ret);
> +		return ret;
> +	}
> +
> +	raw_spin_lock_init(&confirm_error_lock);
> +
> +	/* Enable EEH for all adapters */
> +	if (eeh_probe_mode_devtree()) {
> +		list_for_each_entry_safe(hose, tmp,
> +			&hose_list, list_node) {
> +			phb = hose->dn;
> +			traverse_pci_devices(phb, eeh_ops->of_probe, NULL);
> +		}
> +	}
> +
> +	if (eeh_subsystem_enabled)
> +		pr_info("EEH: PCI Enhanced I/O Error Handling Enabled\n");
> +	else
> +		pr_warning("EEH: No capable adapters found\n");
> +
> +	return ret;
> +}
> +
> +core_initcall_sync(eeh_init);
> +
> +/**
> + * eeh_add_device_early - Enable EEH for the indicated device_node
> + * @dn: device node for which to set up EEH
> + *
> + * This routine must be used to perform EEH initialization for PCI
> + * devices that were added after system boot (e.g. hotplug, dlpar).
> + * This routine must be called before any i/o is performed to the
> + * adapter (inluding any config-space i/o).
> + * Whether this actually enables EEH or not for this device depends
> + * on the CEC architecture, type of the device, on earlier boot
> + * command-line arguments & etc.
> + */
> +static void eeh_add_device_early(struct device_node *dn)
> +{
> +	struct pci_controller *phb;
> +
> +	if (!of_node_to_eeh_dev(dn))
> +		return;
> +	phb = of_node_to_eeh_dev(dn)->phb;
> +
> +	/* USB Bus children of PCI devices will not have BUID's */
> +	if (NULL == phb || 0 == phb->buid)
> +		return;
> +
> +	/* FIXME: hotplug support on POWERNV */
> +	eeh_ops->of_probe(dn, NULL);
> +}
> +
> +/**
> + * eeh_add_device_tree_early - Enable EEH for the indicated device
> + * @dn: device node
> + *
> + * This routine must be used to perform EEH initialization for the
> + * indicated PCI device that was added after system boot (e.g.
> + * hotplug, dlpar).
> + */
> +void eeh_add_device_tree_early(struct device_node *dn)
> +{
> +	struct device_node *sib;
> +
> +	for_each_child_of_node(dn, sib)
> +		eeh_add_device_tree_early(sib);
> +	eeh_add_device_early(dn);
> +}
> +EXPORT_SYMBOL_GPL(eeh_add_device_tree_early);
> +
> +/**
> + * eeh_add_device_late - Perform EEH initialization for the indicated pci device
> + * @dev: pci device for which to set up EEH
> + *
> + * This routine must be used to complete EEH initialization for PCI
> + * devices that were added after system boot (e.g. hotplug, dlpar).
> + */
> +static void eeh_add_device_late(struct pci_dev *dev)
> +{
> +	struct device_node *dn;
> +	struct eeh_dev *edev;
> +
> +	if (!dev || !eeh_subsystem_enabled)
> +		return;
> +
> +	pr_debug("EEH: Adding device %s\n", pci_name(dev));
> +
> +	dn = pci_device_to_OF_node(dev);
> +	edev = of_node_to_eeh_dev(dn);
> +	if (edev->pdev == dev) {
> +		pr_debug("EEH: Already referenced !\n");
> +		return;
> +	}
> +	WARN_ON(edev->pdev);
> +
> +	pci_dev_get(dev);
> +	edev->pdev = dev;
> +	dev->dev.archdata.edev = edev;
> +
> +	eeh_addr_cache_insert_dev(dev);
> +}
> +
> +/**
> + * eeh_add_device_tree_late - Perform EEH initialization for the indicated PCI bus
> + * @bus: PCI bus
> + *
> + * This routine must be used to perform EEH initialization for PCI
> + * devices which are attached to the indicated PCI bus. The PCI bus
> + * is added after system boot through hotplug or dlpar.
> + */
> +void eeh_add_device_tree_late(struct pci_bus *bus)
> +{
> +	struct pci_dev *dev;
> +
> +	list_for_each_entry(dev, &bus->devices, bus_list) {
> + 		eeh_add_device_late(dev);
> + 		if (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE) {
> + 			struct pci_bus *subbus = dev->subordinate;
> + 			if (subbus)
> + 				eeh_add_device_tree_late(subbus);
> + 		}
> +	}
> +}
> +EXPORT_SYMBOL_GPL(eeh_add_device_tree_late);
> +
> +/**
> + * eeh_add_sysfs_files - Add EEH sysfs files for the indicated PCI bus
> + * @bus: PCI bus
> + *
> + * This routine must be used to add EEH sysfs files for PCI
> + * devices which are attached to the indicated PCI bus. The PCI bus
> + * is added after system boot through hotplug or dlpar.
> + */
> +void eeh_add_sysfs_files(struct pci_bus *bus)
> +{
> +	struct pci_dev *dev;
> +
> +	list_for_each_entry(dev, &bus->devices, bus_list) {
> +		eeh_sysfs_add_device(dev);
> +		if (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE) {
> +			struct pci_bus *subbus = dev->subordinate;
> +			if (subbus)
> +				eeh_add_sysfs_files(subbus);
> +		}
> +	}
> +}
> +EXPORT_SYMBOL_GPL(eeh_add_sysfs_files);
> +
> +/**
> + * eeh_remove_device - Undo EEH setup for the indicated pci device
> + * @dev: pci device to be removed
> + * @purge_pe: remove the PE or not
> + *
> + * This routine should be called when a device is removed from
> + * a running system (e.g. by hotplug or dlpar).  It unregisters
> + * the PCI device from the EEH subsystem.  I/O errors affecting
> + * this device will no longer be detected after this call; thus,
> + * i/o errors affecting this slot may leave this device unusable.
> + */
> +static void eeh_remove_device(struct pci_dev *dev, int purge_pe)
> +{
> +	struct eeh_dev *edev;
> +
> +	if (!dev || !eeh_subsystem_enabled)
> +		return;
> +	edev = pci_dev_to_eeh_dev(dev);
> +
> +	/* Unregister the device with the EEH/PCI address search system */
> +	pr_debug("EEH: Removing device %s\n", pci_name(dev));
> +
> +	if (!edev || !edev->pdev) {
> +		pr_debug("EEH: Not referenced !\n");
> +		return;
> +	}
> +	edev->pdev = NULL;
> +	dev->dev.archdata.edev = NULL;
> +	pci_dev_put(dev);
> +
> +	eeh_rmv_from_parent_pe(edev, purge_pe);
> +	eeh_addr_cache_rmv_dev(dev);
> +	eeh_sysfs_remove_device(dev);
> +}
> +
> +/**
> + * eeh_remove_bus_device - Undo EEH setup for the indicated PCI device
> + * @dev: PCI device
> + * @purge_pe: remove the corresponding PE or not
> + *
> + * This routine must be called when a device is removed from the
> + * running system through hotplug or dlpar. The corresponding
> + * PCI address cache will be removed.
> + */
> +void eeh_remove_bus_device(struct pci_dev *dev, int purge_pe)
> +{
> +	struct pci_bus *bus = dev->subordinate;
> +	struct pci_dev *child, *tmp;
> +
> +	eeh_remove_device(dev, purge_pe);
> +
> +	if (bus && dev->hdr_type == PCI_HEADER_TYPE_BRIDGE) {
> +		list_for_each_entry_safe(child, tmp, &bus->devices, bus_list)
> +			 eeh_remove_bus_device(child, purge_pe);
> +	}
> +}
> +EXPORT_SYMBOL_GPL(eeh_remove_bus_device);
> +
> +static int proc_eeh_show(struct seq_file *m, void *v)
> +{
> +	if (0 == eeh_subsystem_enabled) {
> +		seq_printf(m, "EEH Subsystem is globally disabled\n");
> +		seq_printf(m, "eeh_total_mmio_ffs=%llu\n", eeh_stats.total_mmio_ffs);
> +	} else {
> +		seq_printf(m, "EEH Subsystem is enabled\n");
> +		seq_printf(m,
> +				"no device=%llu\n"
> +				"no device node=%llu\n"
> +				"no config address=%llu\n"
> +				"check not wanted=%llu\n"
> +				"eeh_total_mmio_ffs=%llu\n"
> +				"eeh_false_positives=%llu\n"
> +				"eeh_slot_resets=%llu\n",
> +				eeh_stats.no_device,
> +				eeh_stats.no_dn,
> +				eeh_stats.no_cfg_addr,
> +				eeh_stats.ignored_check,
> +				eeh_stats.total_mmio_ffs,
> +				eeh_stats.false_positives,
> +				eeh_stats.slot_resets);
> +	}
> +
> +	return 0;
> +}
> +
> +static int proc_eeh_open(struct inode *inode, struct file *file)
> +{
> +	return single_open(file, proc_eeh_show, NULL);
> +}
> +
> +static const struct file_operations proc_eeh_operations = {
> +	.open      = proc_eeh_open,
> +	.read      = seq_read,
> +	.llseek    = seq_lseek,
> +	.release   = single_release,
> +};
> +
> +static int __init eeh_init_proc(void)
> +{
> +	if (machine_is(pseries))
> +		proc_create("powerpc/eeh", 0, NULL, &proc_eeh_operations);
> +	return 0;
> +}
> +__initcall(eeh_init_proc);
> diff --git a/arch/powerpc/kernel/eeh_cache.c b/arch/powerpc/kernel/eeh_cache.c
> new file mode 100644
> index 0000000..5a4c879
> --- /dev/null
> +++ b/arch/powerpc/kernel/eeh_cache.c
> @@ -0,0 +1,319 @@
> +/*
> + * PCI address cache; allows the lookup of PCI devices based on I/O address
> + *
> + * Copyright IBM Corporation 2004
> + * Copyright Linas Vepstas <linas@austin.ibm.com> 2004
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write to the Free Software
> + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307 USA
> + */
> +
> +#include <linux/list.h>
> +#include <linux/pci.h>
> +#include <linux/rbtree.h>
> +#include <linux/slab.h>
> +#include <linux/spinlock.h>
> +#include <linux/atomic.h>
> +#include <asm/pci-bridge.h>
> +#include <asm/ppc-pci.h>
> +
> +
> +/**
> + * The pci address cache subsystem.  This subsystem places
> + * PCI device address resources into a red-black tree, sorted
> + * according to the address range, so that given only an i/o
> + * address, the corresponding PCI device can be **quickly**
> + * found. It is safe to perform an address lookup in an interrupt
> + * context; this ability is an important feature.
> + *
> + * Currently, the only customer of this code is the EEH subsystem;
> + * thus, this code has been somewhat tailored to suit EEH better.
> + * In particular, the cache does *not* hold the addresses of devices
> + * for which EEH is not enabled.
> + *
> + * (Implementation Note: The RB tree seems to be better/faster
> + * than any hash algo I could think of for this problem, even
> + * with the penalty of slow pointer chases for d-cache misses).
> + */
> +struct pci_io_addr_range {
> +	struct rb_node rb_node;
> +	unsigned long addr_lo;
> +	unsigned long addr_hi;
> +	struct eeh_dev *edev;
> +	struct pci_dev *pcidev;
> +	unsigned int flags;
> +};
> +
> +static struct pci_io_addr_cache {
> +	struct rb_root rb_root;
> +	spinlock_t piar_lock;
> +} pci_io_addr_cache_root;
> +
> +static inline struct eeh_dev *__eeh_addr_cache_get_device(unsigned long addr)
> +{
> +	struct rb_node *n = pci_io_addr_cache_root.rb_root.rb_node;
> +
> +	while (n) {
> +		struct pci_io_addr_range *piar;
> +		piar = rb_entry(n, struct pci_io_addr_range, rb_node);
> +
> +		if (addr < piar->addr_lo) {
> +			n = n->rb_left;
> +		} else {
> +			if (addr > piar->addr_hi) {
> +				n = n->rb_right;
> +			} else {
> +				pci_dev_get(piar->pcidev);
> +				return piar->edev;
> +			}
> +		}
> +	}
> +
> +	return NULL;
> +}
> +
> +/**
> + * eeh_addr_cache_get_dev - Get device, given only address
> + * @addr: mmio (PIO) phys address or i/o port number
> + *
> + * Given an mmio phys address, or a port number, find a pci device
> + * that implements this address.  Be sure to pci_dev_put the device
> + * when finished.  I/O port numbers are assumed to be offset
> + * from zero (that is, they do *not* have pci_io_addr added in).
> + * It is safe to call this function within an interrupt.
> + */
> +struct eeh_dev *eeh_addr_cache_get_dev(unsigned long addr)
> +{
> +	struct eeh_dev *edev;
> +	unsigned long flags;
> +
> +	spin_lock_irqsave(&pci_io_addr_cache_root.piar_lock, flags);
> +	edev = __eeh_addr_cache_get_device(addr);
> +	spin_unlock_irqrestore(&pci_io_addr_cache_root.piar_lock, flags);
> +	return edev;
> +}
> +
> +#ifdef DEBUG
> +/*
> + * Handy-dandy debug print routine, does nothing more
> + * than print out the contents of our addr cache.
> + */
> +static void eeh_addr_cache_print(struct pci_io_addr_cache *cache)
> +{
> +	struct rb_node *n;
> +	int cnt = 0;
> +
> +	n = rb_first(&cache->rb_root);
> +	while (n) {
> +		struct pci_io_addr_range *piar;
> +		piar = rb_entry(n, struct pci_io_addr_range, rb_node);
> +		pr_debug("PCI: %s addr range %d [%lx-%lx]: %s\n",
> +		       (piar->flags & IORESOURCE_IO) ? "i/o" : "mem", cnt,
> +		       piar->addr_lo, piar->addr_hi, pci_name(piar->pcidev));
> +		cnt++;
> +		n = rb_next(n);
> +	}
> +}
> +#endif
> +
> +/* Insert address range into the rb tree. */
> +static struct pci_io_addr_range *
> +eeh_addr_cache_insert(struct pci_dev *dev, unsigned long alo,
> +		      unsigned long ahi, unsigned int flags)
> +{
> +	struct rb_node **p = &pci_io_addr_cache_root.rb_root.rb_node;
> +	struct rb_node *parent = NULL;
> +	struct pci_io_addr_range *piar;
> +
> +	/* Walk tree, find a place to insert into tree */
> +	while (*p) {
> +		parent = *p;
> +		piar = rb_entry(parent, struct pci_io_addr_range, rb_node);
> +		if (ahi < piar->addr_lo) {
> +			p = &parent->rb_left;
> +		} else if (alo > piar->addr_hi) {
> +			p = &parent->rb_right;
> +		} else {
> +			if (dev != piar->pcidev ||
> +			    alo != piar->addr_lo || ahi != piar->addr_hi) {
> +				pr_warning("PIAR: overlapping address range\n");
> +			}
> +			return piar;
> +		}
> +	}
> +	piar = kzalloc(sizeof(struct pci_io_addr_range), GFP_ATOMIC);
> +	if (!piar)
> +		return NULL;
> +
> +	pci_dev_get(dev);
> +	piar->addr_lo = alo;
> +	piar->addr_hi = ahi;
> +	piar->edev = pci_dev_to_eeh_dev(dev);
> +	piar->pcidev = dev;
> +	piar->flags = flags;
> +
> +#ifdef DEBUG
> +	pr_debug("PIAR: insert range=[%lx:%lx] dev=%s\n",
> +	                  alo, ahi, pci_name(dev));
> +#endif
> +
> +	rb_link_node(&piar->rb_node, parent, p);
> +	rb_insert_color(&piar->rb_node, &pci_io_addr_cache_root.rb_root);
> +
> +	return piar;
> +}
> +
> +static void __eeh_addr_cache_insert_dev(struct pci_dev *dev)
> +{
> +	struct device_node *dn;
> +	struct eeh_dev *edev;
> +	int i;
> +
> +	dn = pci_device_to_OF_node(dev);
> +	if (!dn) {
> +		pr_warning("PCI: no pci dn found for dev=%s\n", pci_name(dev));
> +		return;
> +	}
> +
> +	edev = of_node_to_eeh_dev(dn);
> +	if (!edev) {
> +		pr_warning("PCI: no EEH dev found for dn=%s\n",
> +			dn->full_name);
> +		return;
> +	}
> +
> +	/* Skip any devices for which EEH is not enabled. */
> +	if (!edev->pe) {
> +#ifdef DEBUG
> +		pr_info("PCI: skip building address cache for=%s - %s\n",
> +			pci_name(dev), dn->full_name);
> +#endif
> +		return;
> +	}
> +
> +	/* Walk resources on this device, poke them into the tree */
> +	for (i = 0; i < DEVICE_COUNT_RESOURCE; i++) {
> +		unsigned long start = pci_resource_start(dev,i);
> +		unsigned long end = pci_resource_end(dev,i);
> +		unsigned int flags = pci_resource_flags(dev,i);
> +
> +		/* We are interested only bus addresses, not dma or other stuff */
> +		if (0 == (flags & (IORESOURCE_IO | IORESOURCE_MEM)))
> +			continue;
> +		if (start == 0 || ~start == 0 || end == 0 || ~end == 0)
> +			 continue;
> +		eeh_addr_cache_insert(dev, start, end, flags);
> +	}
> +}
> +
> +/**
> + * eeh_addr_cache_insert_dev - Add a device to the address cache
> + * @dev: PCI device whose I/O addresses we are interested in.
> + *
> + * In order to support the fast lookup of devices based on addresses,
> + * we maintain a cache of devices that can be quickly searched.
> + * This routine adds a device to that cache.
> + */
> +void eeh_addr_cache_insert_dev(struct pci_dev *dev)
> +{
> +	unsigned long flags;
> +
> +	/* Ignore PCI bridges */
> +	if ((dev->class >> 16) == PCI_BASE_CLASS_BRIDGE)
> +		return;
> +
> +	spin_lock_irqsave(&pci_io_addr_cache_root.piar_lock, flags);
> +	__eeh_addr_cache_insert_dev(dev);
> +	spin_unlock_irqrestore(&pci_io_addr_cache_root.piar_lock, flags);
> +}
> +
> +static inline void __eeh_addr_cache_rmv_dev(struct pci_dev *dev)
> +{
> +	struct rb_node *n;
> +
> +restart:
> +	n = rb_first(&pci_io_addr_cache_root.rb_root);
> +	while (n) {
> +		struct pci_io_addr_range *piar;
> +		piar = rb_entry(n, struct pci_io_addr_range, rb_node);
> +
> +		if (piar->pcidev == dev) {
> +			rb_erase(n, &pci_io_addr_cache_root.rb_root);
> +			pci_dev_put(piar->pcidev);
> +			kfree(piar);
> +			goto restart;
> +		}
> +		n = rb_next(n);
> +	}
> +}
> +
> +/**
> + * eeh_addr_cache_rmv_dev - remove pci device from addr cache
> + * @dev: device to remove
> + *
> + * Remove a device from the addr-cache tree.
> + * This is potentially expensive, since it will walk
> + * the tree multiple times (once per resource).
> + * But so what; device removal doesn't need to be that fast.
> + */
> +void eeh_addr_cache_rmv_dev(struct pci_dev *dev)
> +{
> +	unsigned long flags;
> +
> +	spin_lock_irqsave(&pci_io_addr_cache_root.piar_lock, flags);
> +	__eeh_addr_cache_rmv_dev(dev);
> +	spin_unlock_irqrestore(&pci_io_addr_cache_root.piar_lock, flags);
> +}
> +
> +/**
> + * eeh_addr_cache_build - Build a cache of I/O addresses
> + *
> + * Build a cache of pci i/o addresses.  This cache will be used to
> + * find the pci device that corresponds to a given address.
> + * This routine scans all pci busses to build the cache.
> + * Must be run late in boot process, after the pci controllers
> + * have been scanned for devices (after all device resources are known).
> + */
> +void __init eeh_addr_cache_build(void)
> +{
> +	struct device_node *dn;
> +	struct eeh_dev *edev;
> +	struct pci_dev *dev = NULL;
> +
> +	spin_lock_init(&pci_io_addr_cache_root.piar_lock);
> +
> +	for_each_pci_dev(dev) {
> +		eeh_addr_cache_insert_dev(dev);
> +
> +		dn = pci_device_to_OF_node(dev);
> +		if (!dn)
> +			continue;
> +
> +		edev = of_node_to_eeh_dev(dn);
> +		if (!edev)
> +			continue;
> +
> +		pci_dev_get(dev);  /* matching put is in eeh_remove_device() */
> +		dev->dev.archdata.edev = edev;
> +		edev->pdev = dev;
> +
> +		eeh_sysfs_add_device(dev);
> +	}
> +
> +#ifdef DEBUG
> +	/* Verify tree built up above, echo back the list of addrs. */
> +	eeh_addr_cache_print(&pci_io_addr_cache_root);
> +#endif
> +}
> +
> diff --git a/arch/powerpc/kernel/eeh_dev.c b/arch/powerpc/kernel/eeh_dev.c
> new file mode 100644
> index 0000000..1efa28f
> --- /dev/null
> +++ b/arch/powerpc/kernel/eeh_dev.c
> @@ -0,0 +1,112 @@
> +/*
> + * The file intends to implement dynamic creation of EEH device, which will
> + * be bound with OF node and PCI device simutaneously. The EEH devices would
> + * be foundamental information for EEH core components to work proerly. Besides,
> + * We have to support multiple situations where dynamic creation of EEH device
> + * is required:
> + *
> + * 1) Before PCI emunation starts, we need create EEH devices according to the
> + *    PCI sensitive OF nodes.
> + * 2) When PCI emunation is done, we need do the binding between PCI device and
> + *    the associated EEH device.
> + * 3) DR (Dynamic Reconfiguration) would create PCI sensitive OF node. EEH device
> + *    will be created while PCI sensitive OF node is detected from DR.
> + * 4) PCI hotplug needs redoing the binding between PCI device and EEH device. If
> + *    PHB is newly inserted, we also need create EEH devices accordingly.
> + *
> + * Copyright Benjamin Herrenschmidt & Gavin Shan, IBM Corporation 2012.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write to the Free Software
> + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307 USA
> + */
> +
> +#include <linux/export.h>
> +#include <linux/gfp.h>
> +#include <linux/init.h>
> +#include <linux/kernel.h>
> +#include <linux/pci.h>
> +#include <linux/string.h>
> +
> +#include <asm/pci-bridge.h>
> +#include <asm/ppc-pci.h>
> +
> +/**
> + * eeh_dev_init - Create EEH device according to OF node
> + * @dn: device node
> + * @data: PHB
> + *
> + * It will create EEH device according to the given OF node. The function
> + * might be called by PCI emunation, DR, PHB hotplug.
> + */
> +void *eeh_dev_init(struct device_node *dn, void *data)
> +{
> +	struct pci_controller *phb = data;
> +	struct eeh_dev *edev;
> +
> +	/* Allocate EEH device */
> +	edev = kzalloc(sizeof(*edev), GFP_KERNEL);
> +	if (!edev) {
> +		pr_warning("%s: out of memory\n", __func__);
> +		return NULL;
> +	}
> +
> +	/* Associate EEH device with OF node */
> +	PCI_DN(dn)->edev = edev;
> +	edev->dn  = dn;
> +	edev->phb = phb;
> +	INIT_LIST_HEAD(&edev->list);
> +
> +	return NULL;
> +}
> +
> +/**
> + * eeh_dev_phb_init_dynamic - Create EEH devices for devices included in PHB
> + * @phb: PHB
> + *
> + * Scan the PHB OF node and its child association, then create the
> + * EEH devices accordingly
> + */
> +void eeh_dev_phb_init_dynamic(struct pci_controller *phb)
> +{
> +	struct device_node *dn = phb->dn;
> +
> +	/* EEH PE for PHB */
> +	eeh_phb_pe_create(phb);
> +
> +	/* EEH device for PHB */
> +	eeh_dev_init(dn, phb);
> +
> +	/* EEH devices for children OF nodes */
> +	traverse_pci_devices(dn, eeh_dev_init, phb);
> +}
> +
> +/**
> + * eeh_dev_phb_init - Create EEH devices for devices included in existing PHBs
> + *
> + * Scan all the existing PHBs and create EEH devices for their OF
> + * nodes and their children OF nodes
> + */
> +static int __init eeh_dev_phb_init(void)
> +{
> +	struct pci_controller *phb, *tmp;
> +
> +	list_for_each_entry_safe(phb, tmp, &hose_list, list_node)
> +		eeh_dev_phb_init_dynamic(phb);
> +
> +	pr_info("EEH: devices created\n");
> +
> +	return 0;
> +}
> +
> +core_initcall(eeh_dev_phb_init);
> diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c
> new file mode 100644
> index 0000000..a3fefb6
> --- /dev/null
> +++ b/arch/powerpc/kernel/eeh_driver.c
> @@ -0,0 +1,552 @@
> +/*
> + * PCI Error Recovery Driver for RPA-compliant PPC64 platform.
> + * Copyright IBM Corp. 2004 2005
> + * Copyright Linas Vepstas <linas@linas.org> 2004, 2005
> + *
> + * All rights reserved.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or (at
> + * your option) any later version.
> + *
> + * This program is distributed in the hope that it will be useful, but
> + * WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE, GOOD TITLE or
> + * NON INFRINGEMENT.  See the GNU General Public License for more
> + * details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write to the Free Software
> + * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
> + *
> + * Send comments and feedback to Linas Vepstas <linas@austin.ibm.com>
> + */
> +#include <linux/delay.h>
> +#include <linux/interrupt.h>
> +#include <linux/irq.h>
> +#include <linux/module.h>
> +#include <linux/pci.h>
> +#include <asm/eeh.h>
> +#include <asm/eeh_event.h>
> +#include <asm/ppc-pci.h>
> +#include <asm/pci-bridge.h>
> +#include <asm/prom.h>
> +#include <asm/rtas.h>
> +
> +/**
> + * eeh_pcid_name - Retrieve name of PCI device driver
> + * @pdev: PCI device
> + *
> + * This routine is used to retrieve the name of PCI device driver
> + * if that's valid.
> + */
> +static inline const char *eeh_pcid_name(struct pci_dev *pdev)
> +{
> +	if (pdev && pdev->dev.driver)
> +		return pdev->dev.driver->name;
> +	return "";
> +}
> +
> +/**
> + * eeh_pcid_get - Get the PCI device driver
> + * @pdev: PCI device
> + *
> + * The function is used to retrieve the PCI device driver for
> + * the indicated PCI device. Besides, we will increase the reference
> + * of the PCI device driver to prevent that being unloaded on
> + * the fly. Otherwise, kernel crash would be seen.
> + */
> +static inline struct pci_driver *eeh_pcid_get(struct pci_dev *pdev)
> +{
> +	if (!pdev || !pdev->driver)
> +		return NULL;
> +
> +	if (!try_module_get(pdev->driver->driver.owner))
> +		return NULL;
> +
> +	return pdev->driver;
> +}
> +
> +/**
> + * eeh_pcid_put - Dereference on the PCI device driver
> + * @pdev: PCI device
> + *
> + * The function is called to do dereference on the PCI device
> + * driver of the indicated PCI device.
> + */
> +static inline void eeh_pcid_put(struct pci_dev *pdev)
> +{
> +	if (!pdev || !pdev->driver)
> +		return;
> +
> +	module_put(pdev->driver->driver.owner);
> +}
> +
> +#if 0
> +static void print_device_node_tree(struct pci_dn *pdn, int dent)
> +{
> +	int i;
> +	struct device_node *pc;
> +
> +	if (!pdn)
> +		return;
> +	for (i = 0; i < dent; i++)
> +		printk(" ");
> +	printk("dn=%s mode=%x \tcfg_addr=%x pe_addr=%x \tfull=%s\n",
> +		pdn->node->name, pdn->eeh_mode, pdn->eeh_config_addr,
> +		pdn->eeh_pe_config_addr, pdn->node->full_name);
> +	dent += 3;
> +	pc = pdn->node->child;
> +	while (pc) {
> +		print_device_node_tree(PCI_DN(pc), dent);
> +		pc = pc->sibling;
> +	}
> +}
> +#endif
> +
> +/**
> + * eeh_disable_irq - Disable interrupt for the recovering device
> + * @dev: PCI device
> + *
> + * This routine must be called when reporting temporary or permanent
> + * error to the particular PCI device to disable interrupt of that
> + * device. If the device has enabled MSI or MSI-X interrupt, we needn't
> + * do real work because EEH should freeze DMA transfers for those PCI
> + * devices encountering EEH errors, which includes MSI or MSI-X.
> + */
> +static void eeh_disable_irq(struct pci_dev *dev)
> +{
> +	struct eeh_dev *edev = pci_dev_to_eeh_dev(dev);
> +
> +	/* Don't disable MSI and MSI-X interrupts. They are
> +	 * effectively disabled by the DMA Stopped state
> +	 * when an EEH error occurs.
> +	 */
> +	if (dev->msi_enabled || dev->msix_enabled)
> +		return;
> +
> +	if (!irq_has_action(dev->irq))
> +		return;
> +
> +	edev->mode |= EEH_DEV_IRQ_DISABLED;
> +	disable_irq_nosync(dev->irq);
> +}
> +
> +/**
> + * eeh_enable_irq - Enable interrupt for the recovering device
> + * @dev: PCI device
> + *
> + * This routine must be called to enable interrupt while failed
> + * device could be resumed.
> + */
> +static void eeh_enable_irq(struct pci_dev *dev)
> +{
> +	struct eeh_dev *edev = pci_dev_to_eeh_dev(dev);
> +
> +	if ((edev->mode) & EEH_DEV_IRQ_DISABLED) {
> +		edev->mode &= ~EEH_DEV_IRQ_DISABLED;
> +		enable_irq(dev->irq);
> +	}
> +}
> +
> +/**
> + * eeh_report_error - Report pci error to each device driver
> + * @data: eeh device
> + * @userdata: return value
> + *
> + * Report an EEH error to each device driver, collect up and
> + * merge the device driver responses. Cumulative response
> + * passed back in "userdata".
> + */
> +static void *eeh_report_error(void *data, void *userdata)
> +{
> +	struct eeh_dev *edev = (struct eeh_dev *)data;
> +	struct pci_dev *dev = eeh_dev_to_pci_dev(edev);
> +	enum pci_ers_result rc, *res = userdata;
> +	struct pci_driver *driver;
> +
> +	/* We might not have the associated PCI device,
> +	 * then we should continue for next one.
> +	 */
> +	if (!dev) return NULL;
> +	dev->error_state = pci_channel_io_frozen;
> +
> +	driver = eeh_pcid_get(dev);
> +	if (!driver) return NULL;
> +
> +	eeh_disable_irq(dev);
> +
> +	if (!driver->err_handler ||
> +	    !driver->err_handler->error_detected) {
> +		eeh_pcid_put(dev);
> +		return NULL;
> +	}
> +
> +	rc = driver->err_handler->error_detected(dev, pci_channel_io_frozen);
> +
> +	/* A driver that needs a reset trumps all others */
> +	if (rc == PCI_ERS_RESULT_NEED_RESET) *res = rc;
> +	if (*res == PCI_ERS_RESULT_NONE) *res = rc;
> +
> +	eeh_pcid_put(dev);
> +	return NULL;
> +}
> +
> +/**
> + * eeh_report_mmio_enabled - Tell drivers that MMIO has been enabled
> + * @data: eeh device
> + * @userdata: return value
> + *
> + * Tells each device driver that IO ports, MMIO and config space I/O
> + * are now enabled. Collects up and merges the device driver responses.
> + * Cumulative response passed back in "userdata".
> + */
> +static void *eeh_report_mmio_enabled(void *data, void *userdata)
> +{
> +	struct eeh_dev *edev = (struct eeh_dev *)data;
> +	struct pci_dev *dev = eeh_dev_to_pci_dev(edev);
> +	enum pci_ers_result rc, *res = userdata;
> +	struct pci_driver *driver;
> +
> +	driver = eeh_pcid_get(dev);
> +	if (!driver) return NULL;
> +
> +	if (!driver->err_handler ||
> +	    !driver->err_handler->mmio_enabled) {
> +		eeh_pcid_put(dev);
> +		return NULL;
> +	}
> +
> +	rc = driver->err_handler->mmio_enabled(dev);
> +
> +	/* A driver that needs a reset trumps all others */
> +	if (rc == PCI_ERS_RESULT_NEED_RESET) *res = rc;
> +	if (*res == PCI_ERS_RESULT_NONE) *res = rc;
> +
> +	eeh_pcid_put(dev);
> +	return NULL;
> +}
> +
> +/**
> + * eeh_report_reset - Tell device that slot has been reset
> + * @data: eeh device
> + * @userdata: return value
> + *
> + * This routine must be called while EEH tries to reset particular
> + * PCI device so that the associated PCI device driver could take
> + * some actions, usually to save data the driver needs so that the
> + * driver can work again while the device is recovered.
> + */
> +static void *eeh_report_reset(void *data, void *userdata)
> +{
> +	struct eeh_dev *edev = (struct eeh_dev *)data;
> +	struct pci_dev *dev = eeh_dev_to_pci_dev(edev);
> +	enum pci_ers_result rc, *res = userdata;
> +	struct pci_driver *driver;
> +
> +	if (!dev) return NULL;
> +	dev->error_state = pci_channel_io_normal;
> +
> +	driver = eeh_pcid_get(dev);
> +	if (!driver) return NULL;
> +
> +	eeh_enable_irq(dev);
> +
> +	if (!driver->err_handler ||
> +	    !driver->err_handler->slot_reset) {
> +		eeh_pcid_put(dev);
> +		return NULL;
> +	}
> +
> +	rc = driver->err_handler->slot_reset(dev);
> +	if ((*res == PCI_ERS_RESULT_NONE) ||
> +	    (*res == PCI_ERS_RESULT_RECOVERED)) *res = rc;
> +	if (*res == PCI_ERS_RESULT_DISCONNECT &&
> +	     rc == PCI_ERS_RESULT_NEED_RESET) *res = rc;
> +
> +	eeh_pcid_put(dev);
> +	return NULL;
> +}
> +
> +/**
> + * eeh_report_resume - Tell device to resume normal operations
> + * @data: eeh device
> + * @userdata: return value
> + *
> + * This routine must be called to notify the device driver that it
> + * could resume so that the device driver can do some initialization
> + * to make the recovered device work again.
> + */
> +static void *eeh_report_resume(void *data, void *userdata)
> +{
> +	struct eeh_dev *edev = (struct eeh_dev *)data;
> +	struct pci_dev *dev = eeh_dev_to_pci_dev(edev);
> +	struct pci_driver *driver;
> +
> +	if (!dev) return NULL;
> +	dev->error_state = pci_channel_io_normal;
> +
> +	driver = eeh_pcid_get(dev);
> +	if (!driver) return NULL;
> +
> +	eeh_enable_irq(dev);
> +
> +	if (!driver->err_handler ||
> +	    !driver->err_handler->resume) {
> +		eeh_pcid_put(dev);
> +		return NULL;
> +	}
> +
> +	driver->err_handler->resume(dev);
> +
> +	eeh_pcid_put(dev);
> +	return NULL;
> +}
> +
> +/**
> + * eeh_report_failure - Tell device driver that device is dead.
> + * @data: eeh device
> + * @userdata: return value
> + *
> + * This informs the device driver that the device is permanently
> + * dead, and that no further recovery attempts will be made on it.
> + */
> +static void *eeh_report_failure(void *data, void *userdata)
> +{
> +	struct eeh_dev *edev = (struct eeh_dev *)data;
> +	struct pci_dev *dev = eeh_dev_to_pci_dev(edev);
> +	struct pci_driver *driver;
> +
> +	if (!dev) return NULL;
> +	dev->error_state = pci_channel_io_perm_failure;
> +
> +	driver = eeh_pcid_get(dev);
> +	if (!driver) return NULL;
> +
> +	eeh_disable_irq(dev);
> +
> +	if (!driver->err_handler ||
> +	    !driver->err_handler->error_detected) {
> +		eeh_pcid_put(dev);
> +		return NULL;
> +	}
> +
> +	driver->err_handler->error_detected(dev, pci_channel_io_perm_failure);
> +
> +	eeh_pcid_put(dev);
> +	return NULL;
> +}
> +
> +/**
> + * eeh_reset_device - Perform actual reset of a pci slot
> + * @pe: EEH PE
> + * @bus: PCI bus corresponding to the isolcated slot
> + *
> + * This routine must be called to do reset on the indicated PE.
> + * During the reset, udev might be invoked because those affected
> + * PCI devices will be removed and then added.
> + */
> +static int eeh_reset_device(struct eeh_pe *pe, struct pci_bus *bus)
> +{
> +	int cnt, rc;
> +
> +	/* pcibios will clear the counter; save the value */
> +	cnt = pe->freeze_count;
> +
> +	/*
> +	 * We don't remove the corresponding PE instances because
> +	 * we need the information afterwords. The attached EEH
> +	 * devices are expected to be attached soon when calling
> +	 * into pcibios_add_pci_devices().
> +	 */
> +	if (bus)
> +		__pcibios_remove_pci_devices(bus, 0);
> +
> +	/* Reset the pci controller. (Asserts RST#; resets config space).
> +	 * Reconfigure bridges and devices. Don't try to bring the system
> +	 * up if the reset failed for some reason.
> +	 */
> +	rc = eeh_reset_pe(pe);
> +	if (rc)
> +		return rc;
> +
> +	/* Restore PE */
> +	eeh_ops->configure_bridge(pe);
> +	eeh_pe_restore_bars(pe);
> +
> +	/* Give the system 5 seconds to finish running the user-space
> +	 * hotplug shutdown scripts, e.g. ifdown for ethernet.  Yes,
> +	 * this is a hack, but if we don't do this, and try to bring
> +	 * the device up before the scripts have taken it down,
> +	 * potentially weird things happen.
> +	 */
> +	if (bus) {
> +		ssleep(5);
> +		pcibios_add_pci_devices(bus);
> +	}
> +	pe->freeze_count = cnt;
> +
> +	return 0;
> +}
> +
> +/* The longest amount of time to wait for a pci device
> + * to come back on line, in seconds.
> + */
> +#define MAX_WAIT_FOR_RECOVERY 150
> +
> +/**
> + * eeh_handle_event - Reset a PCI device after hard lockup.
> + * @pe: EEH PE
> + *
> + * While PHB detects address or data parity errors on particular PCI
> + * slot, the associated PE will be frozen. Besides, DMA's occurring
> + * to wild addresses (which usually happen due to bugs in device
> + * drivers or in PCI adapter firmware) can cause EEH error. #SERR,
> + * #PERR or other misc PCI-related errors also can trigger EEH errors.
> + *
> + * Recovery process consists of unplugging the device driver (which
> + * generated hotplug events to userspace), then issuing a PCI #RST to
> + * the device, then reconfiguring the PCI config space for all bridges
> + * & devices under this slot, and then finally restarting the device
> + * drivers (which cause a second set of hotplug events to go out to
> + * userspace).
> + */
> +void eeh_handle_event(struct eeh_pe *pe)
> +{
> +	struct pci_bus *frozen_bus;
> +	int rc = 0;
> +	enum pci_ers_result result = PCI_ERS_RESULT_NONE;
> +
> +	frozen_bus = eeh_pe_bus_get(pe);
> +	if (!frozen_bus) {
> +		pr_err("%s: Cannot find PCI bus for PHB#%d-PE#%x\n",
> +			__func__, pe->phb->global_number, pe->addr);
> +		return;
> +	}
> +
> +	pe->freeze_count++;
> +	if (pe->freeze_count > EEH_MAX_ALLOWED_FREEZES)
> +		goto excess_failures;
> +	pr_warning("EEH: This PCI device has failed %d times in the last hour\n",
> +		pe->freeze_count);
> +
> +	/* Walk the various device drivers attached to this slot through
> +	 * a reset sequence, giving each an opportunity to do what it needs
> +	 * to accomplish the reset.  Each child gets a report of the
> +	 * status ... if any child can't handle the reset, then the entire
> +	 * slot is dlpar removed and added.
> +	 */
> +	eeh_pe_dev_traverse(pe, eeh_report_error, &result);
> +
> +	/* Get the current PCI slot state. This can take a long time,
> +	 * sometimes over 3 seconds for certain systems.
> +	 */
> +	rc = eeh_ops->wait_state(pe, MAX_WAIT_FOR_RECOVERY*1000);
> +	if (rc < 0 || rc == EEH_STATE_NOT_SUPPORT) {
> +		printk(KERN_WARNING "EEH: Permanent failure\n");
> +		goto hard_fail;
> +	}
> +
> +	/* Since rtas may enable MMIO when posting the error log,
> +	 * don't post the error log until after all dev drivers
> +	 * have been informed.
> +	 */
> +	eeh_slot_error_detail(pe, EEH_LOG_TEMP);
> +
> +	/* If all device drivers were EEH-unaware, then shut
> +	 * down all of the device drivers, and hope they
> +	 * go down willingly, without panicing the system.
> +	 */
> +	if (result == PCI_ERS_RESULT_NONE) {
> +		rc = eeh_reset_device(pe, frozen_bus);
> +		if (rc) {
> +			printk(KERN_WARNING "EEH: Unable to reset, rc=%d\n", rc);
> +			goto hard_fail;
> +		}
> +	}
> +
> +	/* If all devices reported they can proceed, then re-enable MMIO */
> +	if (result == PCI_ERS_RESULT_CAN_RECOVER) {
> +		rc = eeh_pci_enable(pe, EEH_OPT_THAW_MMIO);
> +
> +		if (rc < 0)
> +			goto hard_fail;
> +		if (rc) {
> +			result = PCI_ERS_RESULT_NEED_RESET;
> +		} else {
> +			result = PCI_ERS_RESULT_NONE;
> +			eeh_pe_dev_traverse(pe, eeh_report_mmio_enabled, &result);
> +		}
> +	}
> +
> +	/* If all devices reported they can proceed, then re-enable DMA */
> +	if (result == PCI_ERS_RESULT_CAN_RECOVER) {
> +		rc = eeh_pci_enable(pe, EEH_OPT_THAW_DMA);
> +
> +		if (rc < 0)
> +			goto hard_fail;
> +		if (rc)
> +			result = PCI_ERS_RESULT_NEED_RESET;
> +		else
> +			result = PCI_ERS_RESULT_RECOVERED;
> +	}
> +
> +	/* If any device has a hard failure, then shut off everything. */
> +	if (result == PCI_ERS_RESULT_DISCONNECT) {
> +		printk(KERN_WARNING "EEH: Device driver gave up\n");
> +		goto hard_fail;
> +	}
> +
> +	/* If any device called out for a reset, then reset the slot */
> +	if (result == PCI_ERS_RESULT_NEED_RESET) {
> +		rc = eeh_reset_device(pe, NULL);
> +		if (rc) {
> +			printk(KERN_WARNING "EEH: Cannot reset, rc=%d\n", rc);
> +			goto hard_fail;
> +		}
> +		result = PCI_ERS_RESULT_NONE;
> +		eeh_pe_dev_traverse(pe, eeh_report_reset, &result);
> +	}
> +
> +	/* All devices should claim they have recovered by now. */
> +	if ((result != PCI_ERS_RESULT_RECOVERED) &&
> +	    (result != PCI_ERS_RESULT_NONE)) {
> +		printk(KERN_WARNING "EEH: Not recovered\n");
> +		goto hard_fail;
> +	}
> +
> +	/* Tell all device drivers that they can resume operations */
> +	eeh_pe_dev_traverse(pe, eeh_report_resume, NULL);
> +
> +	return;
> +	
> +excess_failures:
> +	/*
> +	 * About 90% of all real-life EEH failures in the field
> +	 * are due to poorly seated PCI cards. Only 10% or so are
> +	 * due to actual, failed cards.
> +	 */
> +	pr_err("EEH: PHB#%d-PE#%x has failed %d times in the\n"
> +	       "last hour and has been permanently disabled.\n"
> +	       "Please try reseating or replacing it.\n",
> +		pe->phb->global_number, pe->addr,
> +		pe->freeze_count);
> +	goto perm_error;
> +
> +hard_fail:
> +	pr_err("EEH: Unable to recover from failure from PHB#%d-PE#%x.\n"
> +	       "Please try reseating or replacing it\n",
> +		pe->phb->global_number, pe->addr);
> +
> +perm_error:
> +	eeh_slot_error_detail(pe, EEH_LOG_PERM);
> +
> +	/* Notify all devices that they're about to go down. */
> +	eeh_pe_dev_traverse(pe, eeh_report_failure, NULL);
> +
> +	/* Shut down the device drivers for good. */
> +	if (frozen_bus)
> +		pcibios_remove_pci_devices(frozen_bus);
> +}
> +
> diff --git a/arch/powerpc/kernel/eeh_event.c b/arch/powerpc/kernel/eeh_event.c
> new file mode 100644
> index 0000000..185bedd
> --- /dev/null
> +++ b/arch/powerpc/kernel/eeh_event.c
> @@ -0,0 +1,142 @@
> +/*
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write to the Free Software
> + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307 USA
> + *
> + * Copyright (c) 2005 Linas Vepstas <linas@linas.org>
> + */
> +
> +#include <linux/delay.h>
> +#include <linux/list.h>
> +#include <linux/mutex.h>
> +#include <linux/sched.h>
> +#include <linux/pci.h>
> +#include <linux/slab.h>
> +#include <linux/workqueue.h>
> +#include <linux/kthread.h>
> +#include <asm/eeh_event.h>
> +#include <asm/ppc-pci.h>
> +
> +/** Overview:
> + *  EEH error states may be detected within exception handlers;
> + *  however, the recovery processing needs to occur asynchronously
> + *  in a normal kernel context and not an interrupt context.
> + *  This pair of routines creates an event and queues it onto a
> + *  work-queue, where a worker thread can drive recovery.
> + */
> +
> +/* EEH event workqueue setup. */
> +static DEFINE_SPINLOCK(eeh_eventlist_lock);
> +LIST_HEAD(eeh_eventlist);
> +static void eeh_thread_launcher(struct work_struct *);
> +DECLARE_WORK(eeh_event_wq, eeh_thread_launcher);
> +
> +/* Serialize reset sequences for a given pci device */
> +DEFINE_MUTEX(eeh_event_mutex);
> +
> +/**
> + * eeh_event_handler - Dispatch EEH events.
> + * @dummy - unused
> + *
> + * The detection of a frozen slot can occur inside an interrupt,
> + * where it can be hard to do anything about it.  The goal of this
> + * routine is to pull these detection events out of the context
> + * of the interrupt handler, and re-dispatch them for processing
> + * at a later time in a normal context.
> + */
> +static int eeh_event_handler(void * dummy)
> +{
> +	unsigned long flags;
> +	struct eeh_event *event;
> +	struct eeh_pe *pe;
> +
> +	spin_lock_irqsave(&eeh_eventlist_lock, flags);
> +	event = NULL;
> +
> +	/* Unqueue the event, get ready to process. */
> +	if (!list_empty(&eeh_eventlist)) {
> +		event = list_entry(eeh_eventlist.next, struct eeh_event, list);
> +		list_del(&event->list);
> +	}
> +	spin_unlock_irqrestore(&eeh_eventlist_lock, flags);
> +
> +	if (event == NULL)
> +		return 0;
> +
> +	/* Serialize processing of EEH events */
> +	mutex_lock(&eeh_event_mutex);
> +	pe = event->pe;
> +	eeh_pe_state_mark(pe, EEH_PE_RECOVERING);
> +	pr_info("EEH: Detected PCI bus error on PHB#%d-PE#%x\n",
> +		pe->phb->global_number, pe->addr);
> +
> +	set_current_state(TASK_INTERRUPTIBLE);	/* Don't add to load average */
> +	eeh_handle_event(pe);
> +	eeh_pe_state_clear(pe, EEH_PE_RECOVERING);
> +
> +	kfree(event);
> +	mutex_unlock(&eeh_event_mutex);
> +
> +	/* If there are no new errors after an hour, clear the counter. */
> +	if (pe && pe->freeze_count > 0) {
> +		msleep_interruptible(3600*1000);
> +		if (pe->freeze_count > 0)
> +			pe->freeze_count--;
> +
> +	}
> +
> +	return 0;
> +}
> +
> +/**
> + * eeh_thread_launcher - Start kernel thread to handle EEH events
> + * @dummy - unused
> + *
> + * This routine is called to start the kernel thread for processing
> + * EEH event.
> + */
> +static void eeh_thread_launcher(struct work_struct *dummy)
> +{
> +	if (IS_ERR(kthread_run(eeh_event_handler, NULL, "eehd")))
> +		printk(KERN_ERR "Failed to start EEH daemon\n");
> +}
> +
> +/**
> + * eeh_send_failure_event - Generate a PCI error event
> + * @pe: EEH PE
> + *
> + * This routine can be called within an interrupt context;
> + * the actual event will be delivered in a normal context
> + * (from a workqueue).
> + */
> +int eeh_send_failure_event(struct eeh_pe *pe)
> +{
> +	unsigned long flags;
> +	struct eeh_event *event;
> +
> +	event = kzalloc(sizeof(*event), GFP_ATOMIC);
> +	if (!event) {
> +		pr_err("EEH: out of memory, event not handled\n");
> +		return -ENOMEM;
> +	}
> +	event->pe = pe;
> +
> +	/* We may or may not be called in an interrupt context */
> +	spin_lock_irqsave(&eeh_eventlist_lock, flags);
> +	list_add(&event->list, &eeh_eventlist);
> +	spin_unlock_irqrestore(&eeh_eventlist_lock, flags);
> +
> +	schedule_work(&eeh_event_wq);
> +
> +	return 0;
> +}
> diff --git a/arch/powerpc/kernel/eeh_pe.c b/arch/powerpc/kernel/eeh_pe.c
> new file mode 100644
> index 0000000..9d4a9e8
> --- /dev/null
> +++ b/arch/powerpc/kernel/eeh_pe.c
> @@ -0,0 +1,653 @@
> +/*
> + * The file intends to implement PE based on the information from
> + * platforms. Basically, there have 3 types of PEs: PHB/Bus/Device.
> + * All the PEs should be organized as hierarchy tree. The first level
> + * of the tree will be associated to existing PHBs since the particular
> + * PE is only meaningful in one PHB domain.
> + *
> + * Copyright Benjamin Herrenschmidt & Gavin Shan, IBM Corporation 2012.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write to the Free Software
> + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307 USA
> + */
> +
> +#include <linux/export.h>
> +#include <linux/gfp.h>
> +#include <linux/init.h>
> +#include <linux/kernel.h>
> +#include <linux/pci.h>
> +#include <linux/string.h>
> +
> +#include <asm/pci-bridge.h>
> +#include <asm/ppc-pci.h>
> +
> +static LIST_HEAD(eeh_phb_pe);
> +
> +/**
> + * eeh_pe_alloc - Allocate PE
> + * @phb: PCI controller
> + * @type: PE type
> + *
> + * Allocate PE instance dynamically.
> + */
> +static struct eeh_pe *eeh_pe_alloc(struct pci_controller *phb, int type)
> +{
> +	struct eeh_pe *pe;
> +
> +	/* Allocate PHB PE */
> +	pe = kzalloc(sizeof(struct eeh_pe), GFP_KERNEL);
> +	if (!pe) return NULL;
> +
> +	/* Initialize PHB PE */
> +	pe->type = type;
> +	pe->phb = phb;
> +	INIT_LIST_HEAD(&pe->child_list);
> +	INIT_LIST_HEAD(&pe->child);
> +	INIT_LIST_HEAD(&pe->edevs);
> +
> +	return pe;
> +}
> +
> +/**
> + * eeh_phb_pe_create - Create PHB PE
> + * @phb: PCI controller
> + *
> + * The function should be called while the PHB is detected during
> + * system boot or PCI hotplug in order to create PHB PE.
> + */
> +int eeh_phb_pe_create(struct pci_controller *phb)
> +{
> +	struct eeh_pe *pe;
> +
> +	/* Allocate PHB PE */
> +	pe = eeh_pe_alloc(phb, EEH_PE_PHB);
> +	if (!pe) {
> +		pr_err("%s: out of memory!\n", __func__);
> +		return -ENOMEM;
> +	}
> +
> +	/* Put it into the list */
> +	eeh_lock();
> +	list_add_tail(&pe->child, &eeh_phb_pe);
> +	eeh_unlock();
> +
> +	pr_debug("EEH: Add PE for PHB#%d\n", phb->global_number);
> +
> +	return 0;
> +}
> +
> +/**
> + * eeh_phb_pe_get - Retrieve PHB PE based on the given PHB
> + * @phb: PCI controller
> + *
> + * The overall PEs form hierarchy tree. The first layer of the
> + * hierarchy tree is composed of PHB PEs. The function is used
> + * to retrieve the corresponding PHB PE according to the given PHB.
> + */
> +static struct eeh_pe *eeh_phb_pe_get(struct pci_controller *phb)
> +{
> +	struct eeh_pe *pe;
> +
> +	list_for_each_entry(pe, &eeh_phb_pe, child) {
> +		/*
> +		 * Actually, we needn't check the type since
> +		 * the PE for PHB has been determined when that
> +		 * was created.
> +		 */
> +		if ((pe->type & EEH_PE_PHB) && pe->phb == phb)
> +			return pe;
> +	}
> +
> +	return NULL;
> +}
> +
> +/**
> + * eeh_pe_next - Retrieve the next PE in the tree
> + * @pe: current PE
> + * @root: root PE
> + *
> + * The function is used to retrieve the next PE in the
> + * hierarchy PE tree.
> + */
> +static struct eeh_pe *eeh_pe_next(struct eeh_pe *pe,
> +				  struct eeh_pe *root)
> +{
> +	struct list_head *next = pe->child_list.next;
> +
> +	if (next == &pe->child_list) {
> +		while (1) {
> +			if (pe == root)
> +				return NULL;
> +			next = pe->child.next;
> +			if (next != &pe->parent->child_list)
> +				break;
> +			pe = pe->parent;
> +		}
> +	}
> +
> +	return list_entry(next, struct eeh_pe, child);
> +}
> +
> +/**
> + * eeh_pe_traverse - Traverse PEs in the specified PHB
> + * @root: root PE
> + * @fn: callback
> + * @flag: extra parameter to callback
> + *
> + * The function is used to traverse the specified PE and its
> + * child PEs. The traversing is to be terminated once the
> + * callback returns something other than NULL, or no more PEs
> + * to be traversed.
> + */
> +static void *eeh_pe_traverse(struct eeh_pe *root,
> +			eeh_traverse_func fn, void *flag)
> +{
> +	struct eeh_pe *pe;
> +	void *ret;
> +
> +	for (pe = root; pe; pe = eeh_pe_next(pe, root)) {
> +		ret = fn(pe, flag);
> +		if (ret) return ret;
> +	}
> +
> +	return NULL;
> +}
> +
> +/**
> + * eeh_pe_dev_traverse - Traverse the devices from the PE
> + * @root: EEH PE
> + * @fn: function callback
> + * @flag: extra parameter to callback
> + *
> + * The function is used to traverse the devices of the specified
> + * PE and its child PEs.
> + */
> +void *eeh_pe_dev_traverse(struct eeh_pe *root,
> +		eeh_traverse_func fn, void *flag)
> +{
> +	struct eeh_pe *pe;
> +	struct eeh_dev *edev;
> +	void *ret;
> +
> +	if (!root) {
> +		pr_warning("%s: Invalid PE %p\n", __func__, root);
> +		return NULL;
> +	}
> +
> +	eeh_lock();
> +
> +	/* Traverse root PE */
> +	for (pe = root; pe; pe = eeh_pe_next(pe, root)) {
> +		eeh_pe_for_each_dev(pe, edev) {
> +			ret = fn(edev, flag);
> +			if (ret) {
> +				eeh_unlock();
> +				return ret;
> +			}
> +		}
> +	}
> +
> +	eeh_unlock();
> +
> +	return NULL;
> +}
> +
> +/**
> + * __eeh_pe_get - Check the PE address
> + * @data: EEH PE
> + * @flag: EEH device
> + *
> + * For one particular PE, it can be identified by PE address
> + * or tranditional BDF address. BDF address is composed of
> + * Bus/Device/Function number. The extra data referred by flag
> + * indicates which type of address should be used.
> + */
> +static void *__eeh_pe_get(void *data, void *flag)
> +{
> +	struct eeh_pe *pe = (struct eeh_pe *)data;
> +	struct eeh_dev *edev = (struct eeh_dev *)flag;
> +
> +	/* Unexpected PHB PE */
> +	if (pe->type & EEH_PE_PHB)
> +		return NULL;
> +
> +	/* We prefer PE address */
> +	if (edev->pe_config_addr &&
> +	   (edev->pe_config_addr == pe->addr))
> +		return pe;
> +
> +	/* Try BDF address */
> +	if (edev->pe_config_addr &&
> +	   (edev->config_addr == pe->config_addr))
> +		return pe;
> +
> +	return NULL;
> +}
> +
> +/**
> + * eeh_pe_get - Search PE based on the given address
> + * @edev: EEH device
> + *
> + * Search the corresponding PE based on the specified address which
> + * is included in the eeh device. The function is used to check if
> + * the associated PE has been created against the PE address. It's
> + * notable that the PE address has 2 format: traditional PE address
> + * which is composed of PCI bus/device/function number, or unified
> + * PE address.
> + */
> +static struct eeh_pe *eeh_pe_get(struct eeh_dev *edev)
> +{
> +	struct eeh_pe *root = eeh_phb_pe_get(edev->phb);
> +	struct eeh_pe *pe;
> +
> +	pe = eeh_pe_traverse(root, __eeh_pe_get, edev);
> +
> +	return pe;
> +}
> +
> +/**
> + * eeh_pe_get_parent - Retrieve the parent PE
> + * @edev: EEH device
> + *
> + * The whole PEs existing in the system are organized as hierarchy
> + * tree. The function is used to retrieve the parent PE according
> + * to the parent EEH device.
> + */
> +static struct eeh_pe *eeh_pe_get_parent(struct eeh_dev *edev)
> +{
> +	struct device_node *dn;
> +	struct eeh_dev *parent;
> +
> +	/*
> +	 * It might have the case for the indirect parent
> +	 * EEH device already having associated PE, but
> +	 * the direct parent EEH device doesn't have yet.
> +	 */
> +	dn = edev->dn->parent;
> +	while (dn) {
> +		/* We're poking out of PCI territory */
> +		if (!PCI_DN(dn)) return NULL;
> +
> +		parent = of_node_to_eeh_dev(dn);
> +		/* We're poking out of PCI territory */
> +		if (!parent) return NULL;
> +
> +		if (parent->pe)
> +			return parent->pe;
> +
> +		dn = dn->parent;
> +	}
> +
> +	return NULL;
> +}
> +
> +/**
> + * eeh_add_to_parent_pe - Add EEH device to parent PE
> + * @edev: EEH device
> + *
> + * Add EEH device to the parent PE. If the parent PE already
> + * exists, the PE type will be changed to EEH_PE_BUS. Otherwise,
> + * we have to create new PE to hold the EEH device and the new
> + * PE will be linked to its parent PE as well.
> + */
> +int eeh_add_to_parent_pe(struct eeh_dev *edev)
> +{
> +	struct eeh_pe *pe, *parent;
> +
> +	eeh_lock();
> +
> +	/*
> +	 * Search the PE has been existing or not according
> +	 * to the PE address. If that has been existing, the
> +	 * PE should be composed of PCI bus and its subordinate
> +	 * components.
> +	 */
> +	pe = eeh_pe_get(edev);
> +	if (pe && !(pe->type & EEH_PE_INVALID)) {
> +		if (!edev->pe_config_addr) {
> +			eeh_unlock();
> +			pr_err("%s: PE with addr 0x%x already exists\n",
> +				__func__, edev->config_addr);
> +			return -EEXIST;
> +		}
> +
> +		/* Mark the PE as type of PCI bus */
> +		pe->type = EEH_PE_BUS;
> +		edev->pe = pe;
> +
> +		/* Put the edev to PE */
> +		list_add_tail(&edev->list, &pe->edevs);
> +		eeh_unlock();
> +		pr_debug("EEH: Add %s to Bus PE#%x\n",
> +			edev->dn->full_name, pe->addr);
> +
> +		return 0;
> +	} else if (pe && (pe->type & EEH_PE_INVALID)) {
> +		list_add_tail(&edev->list, &pe->edevs);
> +		edev->pe = pe;
> +		/*
> +		 * We're running to here because of PCI hotplug caused by
> +		 * EEH recovery. We need clear EEH_PE_INVALID until the top.
> +		 */
> +		parent = pe;
> +		while (parent) {
> +			if (!(parent->type & EEH_PE_INVALID))
> +				break;
> +			parent->type &= ~EEH_PE_INVALID;
> +			parent = parent->parent;
> +		}
> +		eeh_unlock();
> +		pr_debug("EEH: Add %s to Device PE#%x, Parent PE#%x\n",
> +			edev->dn->full_name, pe->addr, pe->parent->addr);
> +
> +		return 0;
> +	}
> +
> +	/* Create a new EEH PE */
> +	pe = eeh_pe_alloc(edev->phb, EEH_PE_DEVICE);
> +	if (!pe) {
> +		eeh_unlock();
> +		pr_err("%s: out of memory!\n", __func__);
> +		return -ENOMEM;
> +	}
> +	pe->addr	= edev->pe_config_addr;
> +	pe->config_addr	= edev->config_addr;
> +
> +	/*
> +	 * Put the new EEH PE into hierarchy tree. If the parent
> +	 * can't be found, the newly created PE will be attached
> +	 * to PHB directly. Otherwise, we have to associate the
> +	 * PE with its parent.
> +	 */
> +	parent = eeh_pe_get_parent(edev);
> +	if (!parent) {
> +		parent = eeh_phb_pe_get(edev->phb);
> +		if (!parent) {
> +			eeh_unlock();
> +			pr_err("%s: No PHB PE is found (PHB Domain=%d)\n",
> +				__func__, edev->phb->global_number);
> +			edev->pe = NULL;
> +			kfree(pe);
> +			return -EEXIST;
> +		}
> +	}
> +	pe->parent = parent;
> +
> +	/*
> +	 * Put the newly created PE into the child list and
> +	 * link the EEH device accordingly.
> +	 */
> +	list_add_tail(&pe->child, &parent->child_list);
> +	list_add_tail(&edev->list, &pe->edevs);
> +	edev->pe = pe;
> +	eeh_unlock();
> +	pr_debug("EEH: Add %s to Device PE#%x, Parent PE#%x\n",
> +		edev->dn->full_name, pe->addr, pe->parent->addr);
> +
> +	return 0;
> +}
> +
> +/**
> + * eeh_rmv_from_parent_pe - Remove one EEH device from the associated PE
> + * @edev: EEH device
> + * @purge_pe: remove PE or not
> + *
> + * The PE hierarchy tree might be changed when doing PCI hotplug.
> + * Also, the PCI devices or buses could be removed from the system
> + * during EEH recovery. So we have to call the function remove the
> + * corresponding PE accordingly if necessary.
> + */
> +int eeh_rmv_from_parent_pe(struct eeh_dev *edev, int purge_pe)
> +{
> +	struct eeh_pe *pe, *parent, *child;
> +	int cnt;
> +
> +	if (!edev->pe) {
> +		pr_warning("%s: No PE found for EEH device %s\n",
> +			__func__, edev->dn->full_name);
> +		return -EEXIST;
> +	}
> +
> +	eeh_lock();
> +
> +	/* Remove the EEH device */
> +	pe = edev->pe;
> +	edev->pe = NULL;
> +	list_del(&edev->list);
> +
> +	/*
> +	 * Check if the parent PE includes any EEH devices.
> +	 * If not, we should delete that. Also, we should
> +	 * delete the parent PE if it doesn't have associated
> +	 * child PEs and EEH devices.
> +	 */
> +	while (1) {
> +		parent = pe->parent;
> +		if (pe->type & EEH_PE_PHB)
> +			break;
> +
> +		if (purge_pe) {
> +			if (list_empty(&pe->edevs) &&
> +			    list_empty(&pe->child_list)) {
> +				list_del(&pe->child);
> +				kfree(pe);
> +			} else {
> +				break;
> +			}
> +		} else {
> +			if (list_empty(&pe->edevs)) {
> +				cnt = 0;
> +				list_for_each_entry(child, &pe->child_list, child) {
> +					if (!(child->type & EEH_PE_INVALID)) {
> +						cnt++;
> +						break;
> +					}
> +				}
> +
> +				if (!cnt)
> +					pe->type |= EEH_PE_INVALID;
> +				else
> +					break;
> +			}
> +		}
> +
> +		pe = parent;
> +	}
> +
> +	eeh_unlock();
> +
> +	return 0;
> +}
> +
> +/**
> + * __eeh_pe_state_mark - Mark the state for the PE
> + * @data: EEH PE
> + * @flag: state
> + *
> + * The function is used to mark the indicated state for the given
> + * PE. Also, the associated PCI devices will be put into IO frozen
> + * state as well.
> + */
> +static void *__eeh_pe_state_mark(void *data, void *flag)
> +{
> +	struct eeh_pe *pe = (struct eeh_pe *)data;
> +	int state = *((int *)flag);
> +	struct eeh_dev *tmp;
> +	struct pci_dev *pdev;
> +
> +	/*
> +	 * Mark the PE with the indicated state. Also,
> +	 * the associated PCI device will be put into
> +	 * I/O frozen state to avoid I/O accesses from
> +	 * the PCI device driver.
> +	 */
> +	pe->state |= state;
> +	eeh_pe_for_each_dev(pe, tmp) {
> +		pdev = eeh_dev_to_pci_dev(tmp);
> +		if (pdev)
> +			pdev->error_state = pci_channel_io_frozen;
> +	}
> +
> +	return NULL;
> +}
> +
> +/**
> + * eeh_pe_state_mark - Mark specified state for PE and its associated device
> + * @pe: EEH PE
> + *
> + * EEH error affects the current PE and its child PEs. The function
> + * is used to mark appropriate state for the affected PEs and the
> + * associated devices.
> + */
> +void eeh_pe_state_mark(struct eeh_pe *pe, int state)
> +{
> +	eeh_lock();
> +	eeh_pe_traverse(pe, __eeh_pe_state_mark, &state);
> +	eeh_unlock();
> +}
> +
> +/**
> + * __eeh_pe_state_clear - Clear state for the PE
> + * @data: EEH PE
> + * @flag: state
> + *
> + * The function is used to clear the indicated state from the
> + * given PE. Besides, we also clear the check count of the PE
> + * as well.
> + */
> +static void *__eeh_pe_state_clear(void *data, void *flag)
> +{
> +	struct eeh_pe *pe = (struct eeh_pe *)data;
> +	int state = *((int *)flag);
> +
> +	pe->state &= ~state;
> +	pe->check_count = 0;
> +
> +	return NULL;
> +}
> +
> +/**
> + * eeh_pe_state_clear - Clear state for the PE and its children
> + * @pe: PE
> + * @state: state to be cleared
> + *
> + * When the PE and its children has been recovered from error,
> + * we need clear the error state for that. The function is used
> + * for the purpose.
> + */
> +void eeh_pe_state_clear(struct eeh_pe *pe, int state)
> +{
> +	eeh_lock();
> +	eeh_pe_traverse(pe, __eeh_pe_state_clear, &state);
> +	eeh_unlock();
> +}
> +
> +/**
> + * eeh_restore_one_device_bars - Restore the Base Address Registers for one device
> + * @data: EEH device
> + * @flag: Unused
> + *
> + * Loads the PCI configuration space base address registers,
> + * the expansion ROM base address, the latency timer, and etc.
> + * from the saved values in the device node.
> + */
> +static void *eeh_restore_one_device_bars(void *data, void *flag)
> +{
> +	int i;
> +	u32 cmd;
> +	struct eeh_dev *edev = (struct eeh_dev *)data;
> +	struct device_node *dn = eeh_dev_to_of_node(edev);
> +
> +	for (i = 4; i < 10; i++)
> +		eeh_ops->write_config(dn, i*4, 4, edev->config_space[i]);
> +	/* 12 == Expansion ROM Address */
> +	eeh_ops->write_config(dn, 12*4, 4, edev->config_space[12]);
> +
> +#define BYTE_SWAP(OFF) (8*((OFF)/4)+3-(OFF))
> +#define SAVED_BYTE(OFF) (((u8 *)(edev->config_space))[BYTE_SWAP(OFF)])
> +
> +	eeh_ops->write_config(dn, PCI_CACHE_LINE_SIZE, 1,
> +		SAVED_BYTE(PCI_CACHE_LINE_SIZE));
> +	eeh_ops->write_config(dn, PCI_LATENCY_TIMER, 1,
> +		SAVED_BYTE(PCI_LATENCY_TIMER));
> +
> +	/* max latency, min grant, interrupt pin and line */
> +	eeh_ops->write_config(dn, 15*4, 4, edev->config_space[15]);
> +
> +	/*
> +	 * Restore PERR & SERR bits, some devices require it,
> +	 * don't touch the other command bits
> +	 */
> +	eeh_ops->read_config(dn, PCI_COMMAND, 4, &cmd);
> +	if (edev->config_space[1] & PCI_COMMAND_PARITY)
> +		cmd |= PCI_COMMAND_PARITY;
> +	else
> +		cmd &= ~PCI_COMMAND_PARITY;
> +	if (edev->config_space[1] & PCI_COMMAND_SERR)
> +		cmd |= PCI_COMMAND_SERR;
> +	else
> +		cmd &= ~PCI_COMMAND_SERR;
> +	eeh_ops->write_config(dn, PCI_COMMAND, 4, cmd);
> +
> +	return NULL;
> +}
> +
> +/**
> + * eeh_pe_restore_bars - Restore the PCI config space info
> + * @pe: EEH PE
> + *
> + * This routine performs a recursive walk to the children
> + * of this device as well.
> + */
> +void eeh_pe_restore_bars(struct eeh_pe *pe)
> +{
> +	/*
> +	 * We needn't take the EEH lock since eeh_pe_dev_traverse()
> +	 * will take that.
> +	 */
> +	eeh_pe_dev_traverse(pe, eeh_restore_one_device_bars, NULL);
> +}
> +
> +/**
> + * eeh_pe_bus_get - Retrieve PCI bus according to the given PE
> + * @pe: EEH PE
> + *
> + * Retrieve the PCI bus according to the given PE. Basically,
> + * there're 3 types of PEs: PHB/Bus/Device. For PHB PE, the
> + * primary PCI bus will be retrieved. The parent bus will be
> + * returned for BUS PE. However, we don't have associated PCI
> + * bus for DEVICE PE.
> + */
> +struct pci_bus *eeh_pe_bus_get(struct eeh_pe *pe)
> +{
> +	struct pci_bus *bus = NULL;
> +	struct eeh_dev *edev;
> +	struct pci_dev *pdev;
> +
> +	eeh_lock();
> +
> +	if (pe->type & EEH_PE_PHB) {
> +		bus = pe->phb->bus;
> +	} else if (pe->type & EEH_PE_BUS ||
> +		   pe->type & EEH_PE_DEVICE) {
> +		edev = list_first_entry(&pe->edevs, struct eeh_dev, list);
> +		pdev = eeh_dev_to_pci_dev(edev);
> +		if (pdev)
> +			bus = pdev->bus;
> +	}
> +
> +	eeh_unlock();
> +
> +	return bus;
> +}
> diff --git a/arch/powerpc/kernel/eeh_sysfs.c b/arch/powerpc/kernel/eeh_sysfs.c
> new file mode 100644
> index 0000000..d377083
> --- /dev/null
> +++ b/arch/powerpc/kernel/eeh_sysfs.c
> @@ -0,0 +1,75 @@
> +/*
> + * Sysfs entries for PCI Error Recovery for PAPR-compliant platform.
> + * Copyright IBM Corporation 2007
> + * Copyright Linas Vepstas <linas@austin.ibm.com> 2007
> + *
> + * All rights reserved.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or (at
> + * your option) any later version.
> + *
> + * This program is distributed in the hope that it will be useful, but
> + * WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE, GOOD TITLE or
> + * NON INFRINGEMENT.  See the GNU General Public License for more
> + * details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write to the Free Software
> + * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
> + *
> + * Send comments and feedback to Linas Vepstas <linas@austin.ibm.com>
> + */
> +#include <linux/pci.h>
> +#include <linux/stat.h>
> +#include <asm/ppc-pci.h>
> +#include <asm/pci-bridge.h>
> +
> +/**
> + * EEH_SHOW_ATTR -- Create sysfs entry for eeh statistic
> + * @_name: name of file in sysfs directory
> + * @_memb: name of member in struct pci_dn to access
> + * @_format: printf format for display
> + *
> + * All of the attributes look very similar, so just
> + * auto-gen a cut-n-paste routine to display them.
> + */
> +#define EEH_SHOW_ATTR(_name,_memb,_format)               \
> +static ssize_t eeh_show_##_name(struct device *dev,      \
> +		struct device_attribute *attr, char *buf)          \
> +{                                                        \
> +	struct pci_dev *pdev = to_pci_dev(dev);               \
> +	struct eeh_dev *edev = pci_dev_to_eeh_dev(pdev);      \
> +	                                                      \
> +	if (!edev)                                            \
> +		return 0;                                     \
> +	                                                      \
> +	return sprintf(buf, _format "\n", edev->_memb);       \
> +}                                                        \
> +static DEVICE_ATTR(_name, S_IRUGO, eeh_show_##_name, NULL);
> +
> +EEH_SHOW_ATTR(eeh_mode,            mode,            "0x%x");
> +EEH_SHOW_ATTR(eeh_config_addr,     config_addr,     "0x%x");
> +EEH_SHOW_ATTR(eeh_pe_config_addr,  pe_config_addr,  "0x%x");
> +
> +void eeh_sysfs_add_device(struct pci_dev *pdev)
> +{
> +	int rc=0;
> +
> +	rc += device_create_file(&pdev->dev, &dev_attr_eeh_mode);
> +	rc += device_create_file(&pdev->dev, &dev_attr_eeh_config_addr);
> +	rc += device_create_file(&pdev->dev, &dev_attr_eeh_pe_config_addr);
> +
> +	if (rc)
> +		printk(KERN_WARNING "EEH: Unable to create sysfs entries\n");
> +}
> +
> +void eeh_sysfs_remove_device(struct pci_dev *pdev)
> +{
> +	device_remove_file(&pdev->dev, &dev_attr_eeh_mode);
> +	device_remove_file(&pdev->dev, &dev_attr_eeh_config_addr);
> +	device_remove_file(&pdev->dev, &dev_attr_eeh_pe_config_addr);
> +}
> +
> diff --git a/arch/powerpc/kernel/pci_hotplug.c b/arch/powerpc/kernel/pci_hotplug.c
> new file mode 100644
> index 0000000..3f60880
> --- /dev/null
> +++ b/arch/powerpc/kernel/pci_hotplug.c
> @@ -0,0 +1,111 @@
> +/*
> + * Derived from "arch/powerpc/platforms/pseries/pci_dlpar.c"
> + *
> + * Copyright (C) 2003 Linda Xie <lxie@us.ibm.com>
> + * Copyright (C) 2005 International Business Machines
> + *
> + * Updates, 2005, John Rose <johnrose@austin.ibm.com>
> + * Updates, 2005, Linas Vepstas <linas@austin.ibm.com>
> + * Updates, 2013, Gavin Shan <shangw@linux.vnet.ibm.com>
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> + */
> +
> +#include <linux/pci.h>
> +#include <linux/export.h>
> +#include <asm/pci-bridge.h>
> +#include <asm/ppc-pci.h>
> +#include <asm/firmware.h>
> +#include <asm/eeh.h>
> +
> +/**
> + * __pcibios_remove_pci_devices - remove all devices under this bus
> + * @bus: the indicated PCI bus
> + * @purge_pe: destroy the PE on removal of PCI devices
> + *
> + * Remove all of the PCI devices under this bus both from the
> + * linux pci device tree, and from the powerpc EEH address cache.
> + * By default, the corresponding PE will be destroied during the
> + * normal PCI hotplug path. For PCI hotplug during EEH recovery,
> + * the corresponding PE won't be destroied and deallocated.
> + */
> +void __pcibios_remove_pci_devices(struct pci_bus *bus, int purge_pe)
> +{
> +	struct pci_dev *dev, *tmp;
> +	struct pci_bus *child_bus;
> +
> +	/* First go down child busses */
> +	list_for_each_entry(child_bus, &bus->children, node)
> +		__pcibios_remove_pci_devices(child_bus, purge_pe);
> +
> +	pr_debug("PCI: Removing devices on bus %04x:%02x\n",
> +		 pci_domain_nr(bus),  bus->number);
> +	list_for_each_entry_safe(dev, tmp, &bus->devices, bus_list) {
> +		pr_debug("     * Removing %s...\n", pci_name(dev));
> +		eeh_remove_bus_device(dev, purge_pe);
> +		pci_stop_and_remove_bus_device(dev);
> +	}
> +}
> +
> +/**
> + * pcibios_remove_pci_devices - remove all devices under this bus
> + * @bus: the indicated PCI bus
> + *
> + * Remove all of the PCI devices under this bus both from the
> + * linux pci device tree, and from the powerpc EEH address cache.
> + */
> +void pcibios_remove_pci_devices(struct pci_bus *bus)
> +{
> +	__pcibios_remove_pci_devices(bus, 1);
> +}
> +EXPORT_SYMBOL_GPL(pcibios_remove_pci_devices);
> +
> +/**
> + * pcibios_add_pci_devices - adds new pci devices to bus
> + * @bus: the indicated PCI bus
> + *
> + * This routine will find and fixup new pci devices under
> + * the indicated bus. This routine presumes that there
> + * might already be some devices under this bridge, so
> + * it carefully tries to add only new devices.  (And that
> + * is how this routine differs from other, similar pcibios
> + * routines.)
> + */
> +void pcibios_add_pci_devices(struct pci_bus * bus)
> +{
> +	int slotno, num, mode, pass, max;
> +	struct pci_dev *dev;
> +	struct device_node *dn = pci_bus_to_OF_node(bus);
> +
> +	eeh_add_device_tree_early(dn);
> +
> +	mode = PCI_PROBE_NORMAL;
> +	if (ppc_md.pci_probe_mode)
> +		mode = ppc_md.pci_probe_mode(bus);
> +
> +	if (mode == PCI_PROBE_DEVTREE) {
> +		/* use ofdt-based probe */
> +		of_rescan_bus(dn, bus);
> +	} else if (mode == PCI_PROBE_NORMAL) {
> +		/* use legacy probe */
> +		slotno = PCI_SLOT(PCI_DN(dn->child)->devfn);
> +		num = pci_scan_slot(bus, PCI_DEVFN(slotno, 0));
> +		if (!num)
> +			return;
> +		pcibios_setup_bus_devices(bus);
> +		max = bus->busn_res.start;
> +		for (pass = 0; pass < 2; pass++) {
> +			list_for_each_entry(dev, &bus->devices, bus_list) {
> +				if (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE ||
> +				    dev->hdr_type == PCI_HEADER_TYPE_CARDBUS)
> +					max = pci_scan_bridge(bus, dev,
> +							      max, pass);
> +			}
> +		}
> +	}
> +	pcibios_finish_adding_to_bus(bus);
> +}
> +EXPORT_SYMBOL_GPL(pcibios_add_pci_devices);
> diff --git a/arch/powerpc/platforms/Kconfig b/arch/powerpc/platforms/Kconfig
> index b62aab3..bed8c60 100644
> --- a/arch/powerpc/platforms/Kconfig
> +++ b/arch/powerpc/platforms/Kconfig
> @@ -164,6 +164,11 @@ config IBMEBUS
>   	help
>   	  Bus device driver for GX bus based adapters.
>
> +config EEH
> +	bool
> +	depends on (PPC_POWERNV || PPC_PSERIES) && PCI
> +	default y
> +
>   config PPC_MPC106
>   	bool
>   	default n
> diff --git a/arch/powerpc/platforms/pseries/Kconfig b/arch/powerpc/platforms/pseries/Kconfig
> index 4459eff..1bd3399 100644
> --- a/arch/powerpc/platforms/pseries/Kconfig
> +++ b/arch/powerpc/platforms/pseries/Kconfig
> @@ -33,11 +33,6 @@ config PPC_SPLPAR
>   	  processors, that is, which share physical processors between
>   	  two or more partitions.
>
> -config EEH
> -	bool
> -	depends on PPC_PSERIES && PCI
> -	default y
> -
>   config PSERIES_MSI
>          bool
>          depends on PCI_MSI && EEH
> diff --git a/arch/powerpc/platforms/pseries/Makefile b/arch/powerpc/platforms/pseries/Makefile
> index 53866e5..8ae0103 100644
> --- a/arch/powerpc/platforms/pseries/Makefile
> +++ b/arch/powerpc/platforms/pseries/Makefile
> @@ -6,9 +6,7 @@ obj-y			:= lpar.o hvCall.o nvram.o reconfig.o \
>   			   firmware.o power.o dlpar.o mobility.o
>   obj-$(CONFIG_SMP)	+= smp.o
>   obj-$(CONFIG_SCANLOG)	+= scanlog.o
> -obj-$(CONFIG_EEH)	+= eeh.o eeh_pe.o eeh_dev.o eeh_cache.o \
> -			   eeh_driver.o eeh_event.o eeh_sysfs.o \
> -			   eeh_pseries.o
> +obj-$(CONFIG_EEH)	+= eeh_pseries.o
>   obj-$(CONFIG_KEXEC)	+= kexec.o
>   obj-$(CONFIG_PCI)	+= pci.o pci_dlpar.o
>   obj-$(CONFIG_PSERIES_MSI)	+= msi.o
> diff --git a/arch/powerpc/platforms/pseries/eeh.c b/arch/powerpc/platforms/pseries/eeh.c
> deleted file mode 100644
> index 6b73d6c..0000000
> --- a/arch/powerpc/platforms/pseries/eeh.c
> +++ /dev/null
> @@ -1,942 +0,0 @@
> -/*
> - * Copyright IBM Corporation 2001, 2005, 2006
> - * Copyright Dave Engebretsen & Todd Inglett 2001
> - * Copyright Linas Vepstas 2005, 2006
> - * Copyright 2001-2012 IBM Corporation.
> - *
> - * This program is free software; you can redistribute it and/or modify
> - * it under the terms of the GNU General Public License as published by
> - * the Free Software Foundation; either version 2 of the License, or
> - * (at your option) any later version.
> - *
> - * This program is distributed in the hope that it will be useful,
> - * but WITHOUT ANY WARRANTY; without even the implied warranty of
> - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> - * GNU General Public License for more details.
> - *
> - * You should have received a copy of the GNU General Public License
> - * along with this program; if not, write to the Free Software
> - * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307 USA
> - *
> - * Please address comments and feedback to Linas Vepstas <linas@austin.ibm.com>
> - */
> -
> -#include <linux/delay.h>
> -#include <linux/sched.h>
> -#include <linux/init.h>
> -#include <linux/list.h>
> -#include <linux/pci.h>
> -#include <linux/proc_fs.h>
> -#include <linux/rbtree.h>
> -#include <linux/seq_file.h>
> -#include <linux/spinlock.h>
> -#include <linux/export.h>
> -#include <linux/of.h>
> -
> -#include <linux/atomic.h>
> -#include <asm/eeh.h>
> -#include <asm/eeh_event.h>
> -#include <asm/io.h>
> -#include <asm/machdep.h>
> -#include <asm/ppc-pci.h>
> -#include <asm/rtas.h>
> -
> -
> -/** Overview:
> - *  EEH, or "Extended Error Handling" is a PCI bridge technology for
> - *  dealing with PCI bus errors that can't be dealt with within the
> - *  usual PCI framework, except by check-stopping the CPU.  Systems
> - *  that are designed for high-availability/reliability cannot afford
> - *  to crash due to a "mere" PCI error, thus the need for EEH.
> - *  An EEH-capable bridge operates by converting a detected error
> - *  into a "slot freeze", taking the PCI adapter off-line, making
> - *  the slot behave, from the OS'es point of view, as if the slot
> - *  were "empty": all reads return 0xff's and all writes are silently
> - *  ignored.  EEH slot isolation events can be triggered by parity
> - *  errors on the address or data busses (e.g. during posted writes),
> - *  which in turn might be caused by low voltage on the bus, dust,
> - *  vibration, humidity, radioactivity or plain-old failed hardware.
> - *
> - *  Note, however, that one of the leading causes of EEH slot
> - *  freeze events are buggy device drivers, buggy device microcode,
> - *  or buggy device hardware.  This is because any attempt by the
> - *  device to bus-master data to a memory address that is not
> - *  assigned to the device will trigger a slot freeze.   (The idea
> - *  is to prevent devices-gone-wild from corrupting system memory).
> - *  Buggy hardware/drivers will have a miserable time co-existing
> - *  with EEH.
> - *
> - *  Ideally, a PCI device driver, when suspecting that an isolation
> - *  event has occurred (e.g. by reading 0xff's), will then ask EEH
> - *  whether this is the case, and then take appropriate steps to
> - *  reset the PCI slot, the PCI device, and then resume operations.
> - *  However, until that day,  the checking is done here, with the
> - *  eeh_check_failure() routine embedded in the MMIO macros.  If
> - *  the slot is found to be isolated, an "EEH Event" is synthesized
> - *  and sent out for processing.
> - */
> -
> -/* If a device driver keeps reading an MMIO register in an interrupt
> - * handler after a slot isolation event, it might be broken.
> - * This sets the threshold for how many read attempts we allow
> - * before printing an error message.
> - */
> -#define EEH_MAX_FAILS	2100000
> -
> -/* Time to wait for a PCI slot to report status, in milliseconds */
> -#define PCI_BUS_RESET_WAIT_MSEC (60*1000)
> -
> -/* Platform dependent EEH operations */
> -struct eeh_ops *eeh_ops = NULL;
> -
> -int eeh_subsystem_enabled;
> -EXPORT_SYMBOL(eeh_subsystem_enabled);
> -
> -/*
> - * EEH probe mode support. The intention is to support multiple
> - * platforms for EEH. Some platforms like pSeries do PCI emunation
> - * based on device tree. However, other platforms like powernv probe
> - * PCI devices from hardware. The flag is used to distinguish that.
> - * In addition, struct eeh_ops::probe would be invoked for particular
> - * OF node or PCI device so that the corresponding PE would be created
> - * there.
> - */
> -int eeh_probe_mode;
> -
> -/* Global EEH mutex */
> -DEFINE_MUTEX(eeh_mutex);
> -
> -/* Lock to avoid races due to multiple reports of an error */
> -static DEFINE_RAW_SPINLOCK(confirm_error_lock);
> -
> -/* Buffer for reporting pci register dumps. Its here in BSS, and
> - * not dynamically alloced, so that it ends up in RMO where RTAS
> - * can access it.
> - */
> -#define EEH_PCI_REGS_LOG_LEN 4096
> -static unsigned char pci_regs_buf[EEH_PCI_REGS_LOG_LEN];
> -
> -/*
> - * The struct is used to maintain the EEH global statistic
> - * information. Besides, the EEH global statistics will be
> - * exported to user space through procfs
> - */
> -struct eeh_stats {
> -	u64 no_device;		/* PCI device not found		*/
> -	u64 no_dn;		/* OF node not found		*/
> -	u64 no_cfg_addr;	/* Config address not found	*/
> -	u64 ignored_check;	/* EEH check skipped		*/
> -	u64 total_mmio_ffs;	/* Total EEH checks		*/
> -	u64 false_positives;	/* Unnecessary EEH checks	*/
> -	u64 slot_resets;	/* PE reset			*/
> -};
> -
> -static struct eeh_stats eeh_stats;
> -
> -#define IS_BRIDGE(class_code) (((class_code)<<16) == PCI_BASE_CLASS_BRIDGE)
> -
> -/**
> - * eeh_gather_pci_data - Copy assorted PCI config space registers to buff
> - * @edev: device to report data for
> - * @buf: point to buffer in which to log
> - * @len: amount of room in buffer
> - *
> - * This routine captures assorted PCI configuration space data,
> - * and puts them into a buffer for RTAS error logging.
> - */
> -static size_t eeh_gather_pci_data(struct eeh_dev *edev, char * buf, size_t len)
> -{
> -	struct device_node *dn = eeh_dev_to_of_node(edev);
> -	struct pci_dev *dev = eeh_dev_to_pci_dev(edev);
> -	u32 cfg;
> -	int cap, i;
> -	int n = 0;
> -
> -	n += scnprintf(buf+n, len-n, "%s\n", dn->full_name);
> -	printk(KERN_WARNING "EEH: of node=%s\n", dn->full_name);
> -
> -	eeh_ops->read_config(dn, PCI_VENDOR_ID, 4, &cfg);
> -	n += scnprintf(buf+n, len-n, "dev/vend:%08x\n", cfg);
> -	printk(KERN_WARNING "EEH: PCI device/vendor: %08x\n", cfg);
> -
> -	eeh_ops->read_config(dn, PCI_COMMAND, 4, &cfg);
> -	n += scnprintf(buf+n, len-n, "cmd/stat:%x\n", cfg);
> -	printk(KERN_WARNING "EEH: PCI cmd/status register: %08x\n", cfg);
> -
> -	if (!dev) {
> -		printk(KERN_WARNING "EEH: no PCI device for this of node\n");
> -		return n;
> -	}
> -
> -	/* Gather bridge-specific registers */
> -	if (dev->class >> 16 == PCI_BASE_CLASS_BRIDGE) {
> -		eeh_ops->read_config(dn, PCI_SEC_STATUS, 2, &cfg);
> -		n += scnprintf(buf+n, len-n, "sec stat:%x\n", cfg);
> -		printk(KERN_WARNING "EEH: Bridge secondary status: %04x\n", cfg);
> -
> -		eeh_ops->read_config(dn, PCI_BRIDGE_CONTROL, 2, &cfg);
> -		n += scnprintf(buf+n, len-n, "brdg ctl:%x\n", cfg);
> -		printk(KERN_WARNING "EEH: Bridge control: %04x\n", cfg);
> -	}
> -
> -	/* Dump out the PCI-X command and status regs */
> -	cap = pci_find_capability(dev, PCI_CAP_ID_PCIX);
> -	if (cap) {
> -		eeh_ops->read_config(dn, cap, 4, &cfg);
> -		n += scnprintf(buf+n, len-n, "pcix-cmd:%x\n", cfg);
> -		printk(KERN_WARNING "EEH: PCI-X cmd: %08x\n", cfg);
> -
> -		eeh_ops->read_config(dn, cap+4, 4, &cfg);
> -		n += scnprintf(buf+n, len-n, "pcix-stat:%x\n", cfg);
> -		printk(KERN_WARNING "EEH: PCI-X status: %08x\n", cfg);
> -	}
> -
> -	/* If PCI-E capable, dump PCI-E cap 10, and the AER */
> -	cap = pci_find_capability(dev, PCI_CAP_ID_EXP);
> -	if (cap) {
> -		n += scnprintf(buf+n, len-n, "pci-e cap10:\n");
> -		printk(KERN_WARNING
> -		       "EEH: PCI-E capabilities and status follow:\n");
> -
> -		for (i=0; i<=8; i++) {
> -			eeh_ops->read_config(dn, cap+4*i, 4, &cfg);
> -			n += scnprintf(buf+n, len-n, "%02x:%x\n", 4*i, cfg);
> -			printk(KERN_WARNING "EEH: PCI-E %02x: %08x\n", i, cfg);
> -		}
> -
> -		cap = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_ERR);
> -		if (cap) {
> -			n += scnprintf(buf+n, len-n, "pci-e AER:\n");
> -			printk(KERN_WARNING
> -			       "EEH: PCI-E AER capability register set follows:\n");
> -
> -			for (i=0; i<14; i++) {
> -				eeh_ops->read_config(dn, cap+4*i, 4, &cfg);
> -				n += scnprintf(buf+n, len-n, "%02x:%x\n", 4*i, cfg);
> -				printk(KERN_WARNING "EEH: PCI-E AER %02x: %08x\n", i, cfg);
> -			}
> -		}
> -	}
> -
> -	return n;
> -}
> -
> -/**
> - * eeh_slot_error_detail - Generate combined log including driver log and error log
> - * @pe: EEH PE
> - * @severity: temporary or permanent error log
> - *
> - * This routine should be called to generate the combined log, which
> - * is comprised of driver log and error log. The driver log is figured
> - * out from the config space of the corresponding PCI device, while
> - * the error log is fetched through platform dependent function call.
> - */
> -void eeh_slot_error_detail(struct eeh_pe *pe, int severity)
> -{
> -	size_t loglen = 0;
> -	struct eeh_dev *edev;
> -
> -	eeh_pci_enable(pe, EEH_OPT_THAW_MMIO);
> -	eeh_ops->configure_bridge(pe);
> -	eeh_pe_restore_bars(pe);
> -
> -	pci_regs_buf[0] = 0;
> -	eeh_pe_for_each_dev(pe, edev) {
> -		loglen += eeh_gather_pci_data(edev, pci_regs_buf,
> -				EEH_PCI_REGS_LOG_LEN);
> -        }
> -
> -	eeh_ops->get_log(pe, severity, pci_regs_buf, loglen);
> -}
> -
> -/**
> - * eeh_token_to_phys - Convert EEH address token to phys address
> - * @token: I/O token, should be address in the form 0xA....
> - *
> - * This routine should be called to convert virtual I/O address
> - * to physical one.
> - */
> -static inline unsigned long eeh_token_to_phys(unsigned long token)
> -{
> -	pte_t *ptep;
> -	unsigned long pa;
> -
> -	ptep = find_linux_pte(init_mm.pgd, token);
> -	if (!ptep)
> -		return token;
> -	pa = pte_pfn(*ptep) << PAGE_SHIFT;
> -
> -	return pa | (token & (PAGE_SIZE-1));
> -}
> -
> -/**
> - * eeh_dev_check_failure - Check if all 1's data is due to EEH slot freeze
> - * @edev: eeh device
> - *
> - * Check for an EEH failure for the given device node.  Call this
> - * routine if the result of a read was all 0xff's and you want to
> - * find out if this is due to an EEH slot freeze.  This routine
> - * will query firmware for the EEH status.
> - *
> - * Returns 0 if there has not been an EEH error; otherwise returns
> - * a non-zero value and queues up a slot isolation event notification.
> - *
> - * It is safe to call this routine in an interrupt context.
> - */
> -int eeh_dev_check_failure(struct eeh_dev *edev)
> -{
> -	int ret;
> -	unsigned long flags;
> -	struct device_node *dn;
> -	struct pci_dev *dev;
> -	struct eeh_pe *pe;
> -	int rc = 0;
> -	const char *location;
> -
> -	eeh_stats.total_mmio_ffs++;
> -
> -	if (!eeh_subsystem_enabled)
> -		return 0;
> -
> -	if (!edev) {
> -		eeh_stats.no_dn++;
> -		return 0;
> -	}
> -	dn = eeh_dev_to_of_node(edev);
> -	dev = eeh_dev_to_pci_dev(edev);
> -	pe = edev->pe;
> -
> -	/* Access to IO BARs might get this far and still not want checking. */
> -	if (!pe) {
> -		eeh_stats.ignored_check++;
> -		pr_debug("EEH: Ignored check for %s %s\n",
> -			eeh_pci_name(dev), dn->full_name);
> -		return 0;
> -	}
> -
> -	if (!pe->addr && !pe->config_addr) {
> -		eeh_stats.no_cfg_addr++;
> -		return 0;
> -	}
> -
> -	/* If we already have a pending isolation event for this
> -	 * slot, we know it's bad already, we don't need to check.
> -	 * Do this checking under a lock; as multiple PCI devices
> -	 * in one slot might report errors simultaneously, and we
> -	 * only want one error recovery routine running.
> -	 */
> -	raw_spin_lock_irqsave(&confirm_error_lock, flags);
> -	rc = 1;
> -	if (pe->state & EEH_PE_ISOLATED) {
> -		pe->check_count++;
> -		if (pe->check_count % EEH_MAX_FAILS == 0) {
> -			location = of_get_property(dn, "ibm,loc-code", NULL);
> -			printk(KERN_ERR "EEH: %d reads ignored for recovering device at "
> -				"location=%s driver=%s pci addr=%s\n",
> -				pe->check_count, location,
> -				eeh_driver_name(dev), eeh_pci_name(dev));
> -			printk(KERN_ERR "EEH: Might be infinite loop in %s driver\n",
> -				eeh_driver_name(dev));
> -			dump_stack();
> -		}
> -		goto dn_unlock;
> -	}
> -
> -	/*
> -	 * Now test for an EEH failure.  This is VERY expensive.
> -	 * Note that the eeh_config_addr may be a parent device
> -	 * in the case of a device behind a bridge, or it may be
> -	 * function zero of a multi-function device.
> -	 * In any case they must share a common PHB.
> -	 */
> -	ret = eeh_ops->get_state(pe, NULL);
> -
> -	/* Note that config-io to empty slots may fail;
> -	 * they are empty when they don't have children.
> -	 * We will punt with the following conditions: Failure to get
> -	 * PE's state, EEH not support and Permanently unavailable
> -	 * state, PE is in good state.
> -	 */
> -	if ((ret < 0) ||
> -	    (ret == EEH_STATE_NOT_SUPPORT) ||
> -	    (ret & (EEH_STATE_MMIO_ACTIVE | EEH_STATE_DMA_ACTIVE)) ==
> -	    (EEH_STATE_MMIO_ACTIVE | EEH_STATE_DMA_ACTIVE)) {
> -		eeh_stats.false_positives++;
> -		pe->false_positives++;
> -		rc = 0;
> -		goto dn_unlock;
> -	}
> -
> -	eeh_stats.slot_resets++;
> -
> -	/* Avoid repeated reports of this failure, including problems
> -	 * with other functions on this device, and functions under
> -	 * bridges.
> -	 */
> -	eeh_pe_state_mark(pe, EEH_PE_ISOLATED);
> -	raw_spin_unlock_irqrestore(&confirm_error_lock, flags);
> -
> -	eeh_send_failure_event(pe);
> -
> -	/* Most EEH events are due to device driver bugs.  Having
> -	 * a stack trace will help the device-driver authors figure
> -	 * out what happened.  So print that out.
> -	 */
> -	WARN(1, "EEH: failure detected\n");
> -	return 1;
> -
> -dn_unlock:
> -	raw_spin_unlock_irqrestore(&confirm_error_lock, flags);
> -	return rc;
> -}
> -
> -EXPORT_SYMBOL_GPL(eeh_dev_check_failure);
> -
> -/**
> - * eeh_check_failure - Check if all 1's data is due to EEH slot freeze
> - * @token: I/O token, should be address in the form 0xA....
> - * @val: value, should be all 1's (XXX why do we need this arg??)
> - *
> - * Check for an EEH failure at the given token address.  Call this
> - * routine if the result of a read was all 0xff's and you want to
> - * find out if this is due to an EEH slot freeze event.  This routine
> - * will query firmware for the EEH status.
> - *
> - * Note this routine is safe to call in an interrupt context.
> - */
> -unsigned long eeh_check_failure(const volatile void __iomem *token, unsigned long val)
> -{
> -	unsigned long addr;
> -	struct eeh_dev *edev;
> -
> -	/* Finding the phys addr + pci device; this is pretty quick. */
> -	addr = eeh_token_to_phys((unsigned long __force) token);
> -	edev = eeh_addr_cache_get_dev(addr);
> -	if (!edev) {
> -		eeh_stats.no_device++;
> -		return val;
> -	}
> -
> -	eeh_dev_check_failure(edev);
> -
> -	pci_dev_put(eeh_dev_to_pci_dev(edev));
> -	return val;
> -}
> -
> -EXPORT_SYMBOL(eeh_check_failure);
> -
> -
> -/**
> - * eeh_pci_enable - Enable MMIO or DMA transfers for this slot
> - * @pe: EEH PE
> - *
> - * This routine should be called to reenable frozen MMIO or DMA
> - * so that it would work correctly again. It's useful while doing
> - * recovery or log collection on the indicated device.
> - */
> -int eeh_pci_enable(struct eeh_pe *pe, int function)
> -{
> -	int rc;
> -
> -	rc = eeh_ops->set_option(pe, function);
> -	if (rc)
> -		pr_warning("%s: Unexpected state change %d on PHB#%d-PE#%x, err=%d\n",
> -			__func__, function, pe->phb->global_number, pe->addr, rc);
> -
> -	rc = eeh_ops->wait_state(pe, PCI_BUS_RESET_WAIT_MSEC);
> -	if (rc > 0 && (rc & EEH_STATE_MMIO_ENABLED) &&
> -	   (function == EEH_OPT_THAW_MMIO))
> -		return 0;
> -
> -	return rc;
> -}
> -
> -/**
> - * pcibios_set_pcie_slot_reset - Set PCI-E reset state
> - * @dev: pci device struct
> - * @state: reset state to enter
> - *
> - * Return value:
> - * 	0 if success
> - */
> -int pcibios_set_pcie_reset_state(struct pci_dev *dev, enum pcie_reset_state state)
> -{
> -	struct eeh_dev *edev = pci_dev_to_eeh_dev(dev);
> -	struct eeh_pe *pe = edev->pe;
> -
> -	if (!pe) {
> -		pr_err("%s: No PE found on PCI device %s\n",
> -			__func__, pci_name(dev));
> -		return -EINVAL;
> -	}
> -
> -	switch (state) {
> -	case pcie_deassert_reset:
> -		eeh_ops->reset(pe, EEH_RESET_DEACTIVATE);
> -		break;
> -	case pcie_hot_reset:
> -		eeh_ops->reset(pe, EEH_RESET_HOT);
> -		break;
> -	case pcie_warm_reset:
> -		eeh_ops->reset(pe, EEH_RESET_FUNDAMENTAL);
> -		break;
> -	default:
> -		return -EINVAL;
> -	};
> -
> -	return 0;
> -}
> -
> -/**
> - * eeh_set_pe_freset - Check the required reset for the indicated device
> - * @data: EEH device
> - * @flag: return value
> - *
> - * Each device might have its preferred reset type: fundamental or
> - * hot reset. The routine is used to collected the information for
> - * the indicated device and its children so that the bunch of the
> - * devices could be reset properly.
> - */
> -static void *eeh_set_dev_freset(void *data, void *flag)
> -{
> -	struct pci_dev *dev;
> -	unsigned int *freset = (unsigned int *)flag;
> -	struct eeh_dev *edev = (struct eeh_dev *)data;
> -
> -	dev = eeh_dev_to_pci_dev(edev);
> -	if (dev)
> -		*freset |= dev->needs_freset;
> -
> -	return NULL;
> -}
> -
> -/**
> - * eeh_reset_pe_once - Assert the pci #RST line for 1/4 second
> - * @pe: EEH PE
> - *
> - * Assert the PCI #RST line for 1/4 second.
> - */
> -static void eeh_reset_pe_once(struct eeh_pe *pe)
> -{
> -	unsigned int freset = 0;
> -
> -	/* Determine type of EEH reset required for
> -	 * Partitionable Endpoint, a hot-reset (1)
> -	 * or a fundamental reset (3).
> -	 * A fundamental reset required by any device under
> -	 * Partitionable Endpoint trumps hot-reset.
> -  	 */
> -	eeh_pe_dev_traverse(pe, eeh_set_dev_freset, &freset);
> -
> -	if (freset)
> -		eeh_ops->reset(pe, EEH_RESET_FUNDAMENTAL);
> -	else
> -		eeh_ops->reset(pe, EEH_RESET_HOT);
> -
> -	/* The PCI bus requires that the reset be held high for at least
> -	 * a 100 milliseconds. We wait a bit longer 'just in case'.
> -	 */
> -#define PCI_BUS_RST_HOLD_TIME_MSEC 250
> -	msleep(PCI_BUS_RST_HOLD_TIME_MSEC);
> -	
> -	/* We might get hit with another EEH freeze as soon as the
> -	 * pci slot reset line is dropped. Make sure we don't miss
> -	 * these, and clear the flag now.
> -	 */
> -	eeh_pe_state_clear(pe, EEH_PE_ISOLATED);
> -
> -	eeh_ops->reset(pe, EEH_RESET_DEACTIVATE);
> -
> -	/* After a PCI slot has been reset, the PCI Express spec requires
> -	 * a 1.5 second idle time for the bus to stabilize, before starting
> -	 * up traffic.
> -	 */
> -#define PCI_BUS_SETTLE_TIME_MSEC 1800
> -	msleep(PCI_BUS_SETTLE_TIME_MSEC);
> -}
> -
> -/**
> - * eeh_reset_pe - Reset the indicated PE
> - * @pe: EEH PE
> - *
> - * This routine should be called to reset indicated device, including
> - * PE. A PE might include multiple PCI devices and sometimes PCI bridges
> - * might be involved as well.
> - */
> -int eeh_reset_pe(struct eeh_pe *pe)
> -{
> -	int i, rc;
> -
> -	/* Take three shots at resetting the bus */
> -	for (i=0; i<3; i++) {
> -		eeh_reset_pe_once(pe);
> -
> -		rc = eeh_ops->wait_state(pe, PCI_BUS_RESET_WAIT_MSEC);
> -		if (rc == (EEH_STATE_MMIO_ACTIVE | EEH_STATE_DMA_ACTIVE))
> -			return 0;
> -
> -		if (rc < 0) {
> -			pr_err("%s: Unrecoverable slot failure on PHB#%d-PE#%x",
> -				__func__, pe->phb->global_number, pe->addr);
> -			return -1;
> -		}
> -		pr_err("EEH: bus reset %d failed on PHB#%d-PE#%x, rc=%d\n",
> -			i+1, pe->phb->global_number, pe->addr, rc);
> -	}
> -
> -	return -1;
> -}
> -
> -/**
> - * eeh_save_bars - Save device bars
> - * @edev: PCI device associated EEH device
> - *
> - * Save the values of the device bars. Unlike the restore
> - * routine, this routine is *not* recursive. This is because
> - * PCI devices are added individually; but, for the restore,
> - * an entire slot is reset at a time.
> - */
> -void eeh_save_bars(struct eeh_dev *edev)
> -{
> -	int i;
> -	struct device_node *dn;
> -
> -	if (!edev)
> -		return;
> -	dn = eeh_dev_to_of_node(edev);
> -	
> -	for (i = 0; i < 16; i++)
> -		eeh_ops->read_config(dn, i * 4, 4, &edev->config_space[i]);
> -}
> -
> -/**
> - * eeh_ops_register - Register platform dependent EEH operations
> - * @ops: platform dependent EEH operations
> - *
> - * Register the platform dependent EEH operation callback
> - * functions. The platform should call this function before
> - * any other EEH operations.
> - */
> -int __init eeh_ops_register(struct eeh_ops *ops)
> -{
> -	if (!ops->name) {
> -		pr_warning("%s: Invalid EEH ops name for %p\n",
> -			__func__, ops);
> -		return -EINVAL;
> -	}
> -
> -	if (eeh_ops && eeh_ops != ops) {
> -		pr_warning("%s: EEH ops of platform %s already existing (%s)\n",
> -			__func__, eeh_ops->name, ops->name);
> -		return -EEXIST;
> -	}
> -
> -	eeh_ops = ops;
> -
> -	return 0;
> -}
> -
> -/**
> - * eeh_ops_unregister - Unreigster platform dependent EEH operations
> - * @name: name of EEH platform operations
> - *
> - * Unregister the platform dependent EEH operation callback
> - * functions.
> - */
> -int __exit eeh_ops_unregister(const char *name)
> -{
> -	if (!name || !strlen(name)) {
> -		pr_warning("%s: Invalid EEH ops name\n",
> -			__func__);
> -		return -EINVAL;
> -	}
> -
> -	if (eeh_ops && !strcmp(eeh_ops->name, name)) {
> -		eeh_ops = NULL;
> -		return 0;
> -	}
> -
> -	return -EEXIST;
> -}
> -
> -/**
> - * eeh_init - EEH initialization
> - *
> - * Initialize EEH by trying to enable it for all of the adapters in the system.
> - * As a side effect we can determine here if eeh is supported at all.
> - * Note that we leave EEH on so failed config cycles won't cause a machine
> - * check.  If a user turns off EEH for a particular adapter they are really
> - * telling Linux to ignore errors.  Some hardware (e.g. POWER5) won't
> - * grant access to a slot if EEH isn't enabled, and so we always enable
> - * EEH for all slots/all devices.
> - *
> - * The eeh-force-off option disables EEH checking globally, for all slots.
> - * Even if force-off is set, the EEH hardware is still enabled, so that
> - * newer systems can boot.
> - */
> -static int __init eeh_init(void)
> -{
> -	struct pci_controller *hose, *tmp;
> -	struct device_node *phb;
> -	int ret;
> -
> -	/* call platform initialization function */
> -	if (!eeh_ops) {
> -		pr_warning("%s: Platform EEH operation not found\n",
> -			__func__);
> -		return -EEXIST;
> -	} else if ((ret = eeh_ops->init())) {
> -		pr_warning("%s: Failed to call platform init function (%d)\n",
> -			__func__, ret);
> -		return ret;
> -	}
> -
> -	raw_spin_lock_init(&confirm_error_lock);
> -
> -	/* Enable EEH for all adapters */
> -	if (eeh_probe_mode_devtree()) {
> -		list_for_each_entry_safe(hose, tmp,
> -			&hose_list, list_node) {
> -			phb = hose->dn;
> -			traverse_pci_devices(phb, eeh_ops->of_probe, NULL);
> -		}
> -	}
> -
> -	if (eeh_subsystem_enabled)
> -		pr_info("EEH: PCI Enhanced I/O Error Handling Enabled\n");
> -	else
> -		pr_warning("EEH: No capable adapters found\n");
> -
> -	return ret;
> -}
> -
> -core_initcall_sync(eeh_init);
> -
> -/**
> - * eeh_add_device_early - Enable EEH for the indicated device_node
> - * @dn: device node for which to set up EEH
> - *
> - * This routine must be used to perform EEH initialization for PCI
> - * devices that were added after system boot (e.g. hotplug, dlpar).
> - * This routine must be called before any i/o is performed to the
> - * adapter (inluding any config-space i/o).
> - * Whether this actually enables EEH or not for this device depends
> - * on the CEC architecture, type of the device, on earlier boot
> - * command-line arguments & etc.
> - */
> -static void eeh_add_device_early(struct device_node *dn)
> -{
> -	struct pci_controller *phb;
> -
> -	if (!of_node_to_eeh_dev(dn))
> -		return;
> -	phb = of_node_to_eeh_dev(dn)->phb;
> -
> -	/* USB Bus children of PCI devices will not have BUID's */
> -	if (NULL == phb || 0 == phb->buid)
> -		return;
> -
> -	/* FIXME: hotplug support on POWERNV */
> -	eeh_ops->of_probe(dn, NULL);
> -}
> -
> -/**
> - * eeh_add_device_tree_early - Enable EEH for the indicated device
> - * @dn: device node
> - *
> - * This routine must be used to perform EEH initialization for the
> - * indicated PCI device that was added after system boot (e.g.
> - * hotplug, dlpar).
> - */
> -void eeh_add_device_tree_early(struct device_node *dn)
> -{
> -	struct device_node *sib;
> -
> -	for_each_child_of_node(dn, sib)
> -		eeh_add_device_tree_early(sib);
> -	eeh_add_device_early(dn);
> -}
> -EXPORT_SYMBOL_GPL(eeh_add_device_tree_early);
> -
> -/**
> - * eeh_add_device_late - Perform EEH initialization for the indicated pci device
> - * @dev: pci device for which to set up EEH
> - *
> - * This routine must be used to complete EEH initialization for PCI
> - * devices that were added after system boot (e.g. hotplug, dlpar).
> - */
> -static void eeh_add_device_late(struct pci_dev *dev)
> -{
> -	struct device_node *dn;
> -	struct eeh_dev *edev;
> -
> -	if (!dev || !eeh_subsystem_enabled)
> -		return;
> -
> -	pr_debug("EEH: Adding device %s\n", pci_name(dev));
> -
> -	dn = pci_device_to_OF_node(dev);
> -	edev = of_node_to_eeh_dev(dn);
> -	if (edev->pdev == dev) {
> -		pr_debug("EEH: Already referenced !\n");
> -		return;
> -	}
> -	WARN_ON(edev->pdev);
> -
> -	pci_dev_get(dev);
> -	edev->pdev = dev;
> -	dev->dev.archdata.edev = edev;
> -
> -	eeh_addr_cache_insert_dev(dev);
> -}
> -
> -/**
> - * eeh_add_device_tree_late - Perform EEH initialization for the indicated PCI bus
> - * @bus: PCI bus
> - *
> - * This routine must be used to perform EEH initialization for PCI
> - * devices which are attached to the indicated PCI bus. The PCI bus
> - * is added after system boot through hotplug or dlpar.
> - */
> -void eeh_add_device_tree_late(struct pci_bus *bus)
> -{
> -	struct pci_dev *dev;
> -
> -	list_for_each_entry(dev, &bus->devices, bus_list) {
> - 		eeh_add_device_late(dev);
> - 		if (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE) {
> - 			struct pci_bus *subbus = dev->subordinate;
> - 			if (subbus)
> - 				eeh_add_device_tree_late(subbus);
> - 		}
> -	}
> -}
> -EXPORT_SYMBOL_GPL(eeh_add_device_tree_late);
> -
> -/**
> - * eeh_add_sysfs_files - Add EEH sysfs files for the indicated PCI bus
> - * @bus: PCI bus
> - *
> - * This routine must be used to add EEH sysfs files for PCI
> - * devices which are attached to the indicated PCI bus. The PCI bus
> - * is added after system boot through hotplug or dlpar.
> - */
> -void eeh_add_sysfs_files(struct pci_bus *bus)
> -{
> -	struct pci_dev *dev;
> -
> -	list_for_each_entry(dev, &bus->devices, bus_list) {
> -		eeh_sysfs_add_device(dev);
> -		if (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE) {
> -			struct pci_bus *subbus = dev->subordinate;
> -			if (subbus)
> -				eeh_add_sysfs_files(subbus);
> -		}
> -	}
> -}
> -EXPORT_SYMBOL_GPL(eeh_add_sysfs_files);
> -
> -/**
> - * eeh_remove_device - Undo EEH setup for the indicated pci device
> - * @dev: pci device to be removed
> - * @purge_pe: remove the PE or not
> - *
> - * This routine should be called when a device is removed from
> - * a running system (e.g. by hotplug or dlpar).  It unregisters
> - * the PCI device from the EEH subsystem.  I/O errors affecting
> - * this device will no longer be detected after this call; thus,
> - * i/o errors affecting this slot may leave this device unusable.
> - */
> -static void eeh_remove_device(struct pci_dev *dev, int purge_pe)
> -{
> -	struct eeh_dev *edev;
> -
> -	if (!dev || !eeh_subsystem_enabled)
> -		return;
> -	edev = pci_dev_to_eeh_dev(dev);
> -
> -	/* Unregister the device with the EEH/PCI address search system */
> -	pr_debug("EEH: Removing device %s\n", pci_name(dev));
> -
> -	if (!edev || !edev->pdev) {
> -		pr_debug("EEH: Not referenced !\n");
> -		return;
> -	}
> -	edev->pdev = NULL;
> -	dev->dev.archdata.edev = NULL;
> -	pci_dev_put(dev);
> -
> -	eeh_rmv_from_parent_pe(edev, purge_pe);
> -	eeh_addr_cache_rmv_dev(dev);
> -	eeh_sysfs_remove_device(dev);
> -}
> -
> -/**
> - * eeh_remove_bus_device - Undo EEH setup for the indicated PCI device
> - * @dev: PCI device
> - * @purge_pe: remove the corresponding PE or not
> - *
> - * This routine must be called when a device is removed from the
> - * running system through hotplug or dlpar. The corresponding
> - * PCI address cache will be removed.
> - */
> -void eeh_remove_bus_device(struct pci_dev *dev, int purge_pe)
> -{
> -	struct pci_bus *bus = dev->subordinate;
> -	struct pci_dev *child, *tmp;
> -
> -	eeh_remove_device(dev, purge_pe);
> -
> -	if (bus && dev->hdr_type == PCI_HEADER_TYPE_BRIDGE) {
> -		list_for_each_entry_safe(child, tmp, &bus->devices, bus_list)
> -			 eeh_remove_bus_device(child, purge_pe);
> -	}
> -}
> -EXPORT_SYMBOL_GPL(eeh_remove_bus_device);
> -
> -static int proc_eeh_show(struct seq_file *m, void *v)
> -{
> -	if (0 == eeh_subsystem_enabled) {
> -		seq_printf(m, "EEH Subsystem is globally disabled\n");
> -		seq_printf(m, "eeh_total_mmio_ffs=%llu\n", eeh_stats.total_mmio_ffs);
> -	} else {
> -		seq_printf(m, "EEH Subsystem is enabled\n");
> -		seq_printf(m,
> -				"no device=%llu\n"
> -				"no device node=%llu\n"
> -				"no config address=%llu\n"
> -				"check not wanted=%llu\n"
> -				"eeh_total_mmio_ffs=%llu\n"
> -				"eeh_false_positives=%llu\n"
> -				"eeh_slot_resets=%llu\n",
> -				eeh_stats.no_device,
> -				eeh_stats.no_dn,
> -				eeh_stats.no_cfg_addr,
> -				eeh_stats.ignored_check,
> -				eeh_stats.total_mmio_ffs,
> -				eeh_stats.false_positives,
> -				eeh_stats.slot_resets);
> -	}
> -
> -	return 0;
> -}
> -
> -static int proc_eeh_open(struct inode *inode, struct file *file)
> -{
> -	return single_open(file, proc_eeh_show, NULL);
> -}
> -
> -static const struct file_operations proc_eeh_operations = {
> -	.open      = proc_eeh_open,
> -	.read      = seq_read,
> -	.llseek    = seq_lseek,
> -	.release   = single_release,
> -};
> -
> -static int __init eeh_init_proc(void)
> -{
> -	if (machine_is(pseries))
> -		proc_create("powerpc/eeh", 0, NULL, &proc_eeh_operations);
> -	return 0;
> -}
> -__initcall(eeh_init_proc);
> diff --git a/arch/powerpc/platforms/pseries/eeh_cache.c b/arch/powerpc/platforms/pseries/eeh_cache.c
> deleted file mode 100644
> index 5a4c879..0000000
> --- a/arch/powerpc/platforms/pseries/eeh_cache.c
> +++ /dev/null
> @@ -1,319 +0,0 @@
> -/*
> - * PCI address cache; allows the lookup of PCI devices based on I/O address
> - *
> - * Copyright IBM Corporation 2004
> - * Copyright Linas Vepstas <linas@austin.ibm.com> 2004
> - *
> - * This program is free software; you can redistribute it and/or modify
> - * it under the terms of the GNU General Public License as published by
> - * the Free Software Foundation; either version 2 of the License, or
> - * (at your option) any later version.
> - *
> - * This program is distributed in the hope that it will be useful,
> - * but WITHOUT ANY WARRANTY; without even the implied warranty of
> - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> - * GNU General Public License for more details.
> - *
> - * You should have received a copy of the GNU General Public License
> - * along with this program; if not, write to the Free Software
> - * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307 USA
> - */
> -
> -#include <linux/list.h>
> -#include <linux/pci.h>
> -#include <linux/rbtree.h>
> -#include <linux/slab.h>
> -#include <linux/spinlock.h>
> -#include <linux/atomic.h>
> -#include <asm/pci-bridge.h>
> -#include <asm/ppc-pci.h>
> -
> -
> -/**
> - * The pci address cache subsystem.  This subsystem places
> - * PCI device address resources into a red-black tree, sorted
> - * according to the address range, so that given only an i/o
> - * address, the corresponding PCI device can be **quickly**
> - * found. It is safe to perform an address lookup in an interrupt
> - * context; this ability is an important feature.
> - *
> - * Currently, the only customer of this code is the EEH subsystem;
> - * thus, this code has been somewhat tailored to suit EEH better.
> - * In particular, the cache does *not* hold the addresses of devices
> - * for which EEH is not enabled.
> - *
> - * (Implementation Note: The RB tree seems to be better/faster
> - * than any hash algo I could think of for this problem, even
> - * with the penalty of slow pointer chases for d-cache misses).
> - */
> -struct pci_io_addr_range {
> -	struct rb_node rb_node;
> -	unsigned long addr_lo;
> -	unsigned long addr_hi;
> -	struct eeh_dev *edev;
> -	struct pci_dev *pcidev;
> -	unsigned int flags;
> -};
> -
> -static struct pci_io_addr_cache {
> -	struct rb_root rb_root;
> -	spinlock_t piar_lock;
> -} pci_io_addr_cache_root;
> -
> -static inline struct eeh_dev *__eeh_addr_cache_get_device(unsigned long addr)
> -{
> -	struct rb_node *n = pci_io_addr_cache_root.rb_root.rb_node;
> -
> -	while (n) {
> -		struct pci_io_addr_range *piar;
> -		piar = rb_entry(n, struct pci_io_addr_range, rb_node);
> -
> -		if (addr < piar->addr_lo) {
> -			n = n->rb_left;
> -		} else {
> -			if (addr > piar->addr_hi) {
> -				n = n->rb_right;
> -			} else {
> -				pci_dev_get(piar->pcidev);
> -				return piar->edev;
> -			}
> -		}
> -	}
> -
> -	return NULL;
> -}
> -
> -/**
> - * eeh_addr_cache_get_dev - Get device, given only address
> - * @addr: mmio (PIO) phys address or i/o port number
> - *
> - * Given an mmio phys address, or a port number, find a pci device
> - * that implements this address.  Be sure to pci_dev_put the device
> - * when finished.  I/O port numbers are assumed to be offset
> - * from zero (that is, they do *not* have pci_io_addr added in).
> - * It is safe to call this function within an interrupt.
> - */
> -struct eeh_dev *eeh_addr_cache_get_dev(unsigned long addr)
> -{
> -	struct eeh_dev *edev;
> -	unsigned long flags;
> -
> -	spin_lock_irqsave(&pci_io_addr_cache_root.piar_lock, flags);
> -	edev = __eeh_addr_cache_get_device(addr);
> -	spin_unlock_irqrestore(&pci_io_addr_cache_root.piar_lock, flags);
> -	return edev;
> -}
> -
> -#ifdef DEBUG
> -/*
> - * Handy-dandy debug print routine, does nothing more
> - * than print out the contents of our addr cache.
> - */
> -static void eeh_addr_cache_print(struct pci_io_addr_cache *cache)
> -{
> -	struct rb_node *n;
> -	int cnt = 0;
> -
> -	n = rb_first(&cache->rb_root);
> -	while (n) {
> -		struct pci_io_addr_range *piar;
> -		piar = rb_entry(n, struct pci_io_addr_range, rb_node);
> -		pr_debug("PCI: %s addr range %d [%lx-%lx]: %s\n",
> -		       (piar->flags & IORESOURCE_IO) ? "i/o" : "mem", cnt,
> -		       piar->addr_lo, piar->addr_hi, pci_name(piar->pcidev));
> -		cnt++;
> -		n = rb_next(n);
> -	}
> -}
> -#endif
> -
> -/* Insert address range into the rb tree. */
> -static struct pci_io_addr_range *
> -eeh_addr_cache_insert(struct pci_dev *dev, unsigned long alo,
> -		      unsigned long ahi, unsigned int flags)
> -{
> -	struct rb_node **p = &pci_io_addr_cache_root.rb_root.rb_node;
> -	struct rb_node *parent = NULL;
> -	struct pci_io_addr_range *piar;
> -
> -	/* Walk tree, find a place to insert into tree */
> -	while (*p) {
> -		parent = *p;
> -		piar = rb_entry(parent, struct pci_io_addr_range, rb_node);
> -		if (ahi < piar->addr_lo) {
> -			p = &parent->rb_left;
> -		} else if (alo > piar->addr_hi) {
> -			p = &parent->rb_right;
> -		} else {
> -			if (dev != piar->pcidev ||
> -			    alo != piar->addr_lo || ahi != piar->addr_hi) {
> -				pr_warning("PIAR: overlapping address range\n");
> -			}
> -			return piar;
> -		}
> -	}
> -	piar = kzalloc(sizeof(struct pci_io_addr_range), GFP_ATOMIC);
> -	if (!piar)
> -		return NULL;
> -
> -	pci_dev_get(dev);
> -	piar->addr_lo = alo;
> -	piar->addr_hi = ahi;
> -	piar->edev = pci_dev_to_eeh_dev(dev);
> -	piar->pcidev = dev;
> -	piar->flags = flags;
> -
> -#ifdef DEBUG
> -	pr_debug("PIAR: insert range=[%lx:%lx] dev=%s\n",
> -	                  alo, ahi, pci_name(dev));
> -#endif
> -
> -	rb_link_node(&piar->rb_node, parent, p);
> -	rb_insert_color(&piar->rb_node, &pci_io_addr_cache_root.rb_root);
> -
> -	return piar;
> -}
> -
> -static void __eeh_addr_cache_insert_dev(struct pci_dev *dev)
> -{
> -	struct device_node *dn;
> -	struct eeh_dev *edev;
> -	int i;
> -
> -	dn = pci_device_to_OF_node(dev);
> -	if (!dn) {
> -		pr_warning("PCI: no pci dn found for dev=%s\n", pci_name(dev));
> -		return;
> -	}
> -
> -	edev = of_node_to_eeh_dev(dn);
> -	if (!edev) {
> -		pr_warning("PCI: no EEH dev found for dn=%s\n",
> -			dn->full_name);
> -		return;
> -	}
> -
> -	/* Skip any devices for which EEH is not enabled. */
> -	if (!edev->pe) {
> -#ifdef DEBUG
> -		pr_info("PCI: skip building address cache for=%s - %s\n",
> -			pci_name(dev), dn->full_name);
> -#endif
> -		return;
> -	}
> -
> -	/* Walk resources on this device, poke them into the tree */
> -	for (i = 0; i < DEVICE_COUNT_RESOURCE; i++) {
> -		unsigned long start = pci_resource_start(dev,i);
> -		unsigned long end = pci_resource_end(dev,i);
> -		unsigned int flags = pci_resource_flags(dev,i);
> -
> -		/* We are interested only bus addresses, not dma or other stuff */
> -		if (0 == (flags & (IORESOURCE_IO | IORESOURCE_MEM)))
> -			continue;
> -		if (start == 0 || ~start == 0 || end == 0 || ~end == 0)
> -			 continue;
> -		eeh_addr_cache_insert(dev, start, end, flags);
> -	}
> -}
> -
> -/**
> - * eeh_addr_cache_insert_dev - Add a device to the address cache
> - * @dev: PCI device whose I/O addresses we are interested in.
> - *
> - * In order to support the fast lookup of devices based on addresses,
> - * we maintain a cache of devices that can be quickly searched.
> - * This routine adds a device to that cache.
> - */
> -void eeh_addr_cache_insert_dev(struct pci_dev *dev)
> -{
> -	unsigned long flags;
> -
> -	/* Ignore PCI bridges */
> -	if ((dev->class >> 16) == PCI_BASE_CLASS_BRIDGE)
> -		return;
> -
> -	spin_lock_irqsave(&pci_io_addr_cache_root.piar_lock, flags);
> -	__eeh_addr_cache_insert_dev(dev);
> -	spin_unlock_irqrestore(&pci_io_addr_cache_root.piar_lock, flags);
> -}
> -
> -static inline void __eeh_addr_cache_rmv_dev(struct pci_dev *dev)
> -{
> -	struct rb_node *n;
> -
> -restart:
> -	n = rb_first(&pci_io_addr_cache_root.rb_root);
> -	while (n) {
> -		struct pci_io_addr_range *piar;
> -		piar = rb_entry(n, struct pci_io_addr_range, rb_node);
> -
> -		if (piar->pcidev == dev) {
> -			rb_erase(n, &pci_io_addr_cache_root.rb_root);
> -			pci_dev_put(piar->pcidev);
> -			kfree(piar);
> -			goto restart;
> -		}
> -		n = rb_next(n);
> -	}
> -}
> -
> -/**
> - * eeh_addr_cache_rmv_dev - remove pci device from addr cache
> - * @dev: device to remove
> - *
> - * Remove a device from the addr-cache tree.
> - * This is potentially expensive, since it will walk
> - * the tree multiple times (once per resource).
> - * But so what; device removal doesn't need to be that fast.
> - */
> -void eeh_addr_cache_rmv_dev(struct pci_dev *dev)
> -{
> -	unsigned long flags;
> -
> -	spin_lock_irqsave(&pci_io_addr_cache_root.piar_lock, flags);
> -	__eeh_addr_cache_rmv_dev(dev);
> -	spin_unlock_irqrestore(&pci_io_addr_cache_root.piar_lock, flags);
> -}
> -
> -/**
> - * eeh_addr_cache_build - Build a cache of I/O addresses
> - *
> - * Build a cache of pci i/o addresses.  This cache will be used to
> - * find the pci device that corresponds to a given address.
> - * This routine scans all pci busses to build the cache.
> - * Must be run late in boot process, after the pci controllers
> - * have been scanned for devices (after all device resources are known).
> - */
> -void __init eeh_addr_cache_build(void)
> -{
> -	struct device_node *dn;
> -	struct eeh_dev *edev;
> -	struct pci_dev *dev = NULL;
> -
> -	spin_lock_init(&pci_io_addr_cache_root.piar_lock);
> -
> -	for_each_pci_dev(dev) {
> -		eeh_addr_cache_insert_dev(dev);
> -
> -		dn = pci_device_to_OF_node(dev);
> -		if (!dn)
> -			continue;
> -
> -		edev = of_node_to_eeh_dev(dn);
> -		if (!edev)
> -			continue;
> -
> -		pci_dev_get(dev);  /* matching put is in eeh_remove_device() */
> -		dev->dev.archdata.edev = edev;
> -		edev->pdev = dev;
> -
> -		eeh_sysfs_add_device(dev);
> -	}
> -
> -#ifdef DEBUG
> -	/* Verify tree built up above, echo back the list of addrs. */
> -	eeh_addr_cache_print(&pci_io_addr_cache_root);
> -#endif
> -}
> -
> diff --git a/arch/powerpc/platforms/pseries/eeh_dev.c b/arch/powerpc/platforms/pseries/eeh_dev.c
> deleted file mode 100644
> index 1efa28f..0000000
> --- a/arch/powerpc/platforms/pseries/eeh_dev.c
> +++ /dev/null
> @@ -1,112 +0,0 @@
> -/*
> - * The file intends to implement dynamic creation of EEH device, which will
> - * be bound with OF node and PCI device simutaneously. The EEH devices would
> - * be foundamental information for EEH core components to work proerly. Besides,
> - * We have to support multiple situations where dynamic creation of EEH device
> - * is required:
> - *
> - * 1) Before PCI emunation starts, we need create EEH devices according to the
> - *    PCI sensitive OF nodes.
> - * 2) When PCI emunation is done, we need do the binding between PCI device and
> - *    the associated EEH device.
> - * 3) DR (Dynamic Reconfiguration) would create PCI sensitive OF node. EEH device
> - *    will be created while PCI sensitive OF node is detected from DR.
> - * 4) PCI hotplug needs redoing the binding between PCI device and EEH device. If
> - *    PHB is newly inserted, we also need create EEH devices accordingly.
> - *
> - * Copyright Benjamin Herrenschmidt & Gavin Shan, IBM Corporation 2012.
> - *
> - * This program is free software; you can redistribute it and/or modify
> - * it under the terms of the GNU General Public License as published by
> - * the Free Software Foundation; either version 2 of the License, or
> - * (at your option) any later version.
> - *
> - * This program is distributed in the hope that it will be useful,
> - * but WITHOUT ANY WARRANTY; without even the implied warranty of
> - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> - * GNU General Public License for more details.
> - *
> - * You should have received a copy of the GNU General Public License
> - * along with this program; if not, write to the Free Software
> - * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307 USA
> - */
> -
> -#include <linux/export.h>
> -#include <linux/gfp.h>
> -#include <linux/init.h>
> -#include <linux/kernel.h>
> -#include <linux/pci.h>
> -#include <linux/string.h>
> -
> -#include <asm/pci-bridge.h>
> -#include <asm/ppc-pci.h>
> -
> -/**
> - * eeh_dev_init - Create EEH device according to OF node
> - * @dn: device node
> - * @data: PHB
> - *
> - * It will create EEH device according to the given OF node. The function
> - * might be called by PCI emunation, DR, PHB hotplug.
> - */
> -void *eeh_dev_init(struct device_node *dn, void *data)
> -{
> -	struct pci_controller *phb = data;
> -	struct eeh_dev *edev;
> -
> -	/* Allocate EEH device */
> -	edev = kzalloc(sizeof(*edev), GFP_KERNEL);
> -	if (!edev) {
> -		pr_warning("%s: out of memory\n", __func__);
> -		return NULL;
> -	}
> -
> -	/* Associate EEH device with OF node */
> -	PCI_DN(dn)->edev = edev;
> -	edev->dn  = dn;
> -	edev->phb = phb;
> -	INIT_LIST_HEAD(&edev->list);
> -
> -	return NULL;
> -}
> -
> -/**
> - * eeh_dev_phb_init_dynamic - Create EEH devices for devices included in PHB
> - * @phb: PHB
> - *
> - * Scan the PHB OF node and its child association, then create the
> - * EEH devices accordingly
> - */
> -void eeh_dev_phb_init_dynamic(struct pci_controller *phb)
> -{
> -	struct device_node *dn = phb->dn;
> -
> -	/* EEH PE for PHB */
> -	eeh_phb_pe_create(phb);
> -
> -	/* EEH device for PHB */
> -	eeh_dev_init(dn, phb);
> -
> -	/* EEH devices for children OF nodes */
> -	traverse_pci_devices(dn, eeh_dev_init, phb);
> -}
> -
> -/**
> - * eeh_dev_phb_init - Create EEH devices for devices included in existing PHBs
> - *
> - * Scan all the existing PHBs and create EEH devices for their OF
> - * nodes and their children OF nodes
> - */
> -static int __init eeh_dev_phb_init(void)
> -{
> -	struct pci_controller *phb, *tmp;
> -
> -	list_for_each_entry_safe(phb, tmp, &hose_list, list_node)
> -		eeh_dev_phb_init_dynamic(phb);
> -
> -	pr_info("EEH: devices created\n");
> -
> -	return 0;
> -}
> -
> -core_initcall(eeh_dev_phb_init);
> diff --git a/arch/powerpc/platforms/pseries/eeh_driver.c b/arch/powerpc/platforms/pseries/eeh_driver.c
> deleted file mode 100644
> index a3fefb6..0000000
> --- a/arch/powerpc/platforms/pseries/eeh_driver.c
> +++ /dev/null
> @@ -1,552 +0,0 @@
> -/*
> - * PCI Error Recovery Driver for RPA-compliant PPC64 platform.
> - * Copyright IBM Corp. 2004 2005
> - * Copyright Linas Vepstas <linas@linas.org> 2004, 2005
> - *
> - * All rights reserved.
> - *
> - * This program is free software; you can redistribute it and/or modify
> - * it under the terms of the GNU General Public License as published by
> - * the Free Software Foundation; either version 2 of the License, or (at
> - * your option) any later version.
> - *
> - * This program is distributed in the hope that it will be useful, but
> - * WITHOUT ANY WARRANTY; without even the implied warranty of
> - * MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE, GOOD TITLE or
> - * NON INFRINGEMENT.  See the GNU General Public License for more
> - * details.
> - *
> - * You should have received a copy of the GNU General Public License
> - * along with this program; if not, write to the Free Software
> - * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
> - *
> - * Send comments and feedback to Linas Vepstas <linas@austin.ibm.com>
> - */
> -#include <linux/delay.h>
> -#include <linux/interrupt.h>
> -#include <linux/irq.h>
> -#include <linux/module.h>
> -#include <linux/pci.h>
> -#include <asm/eeh.h>
> -#include <asm/eeh_event.h>
> -#include <asm/ppc-pci.h>
> -#include <asm/pci-bridge.h>
> -#include <asm/prom.h>
> -#include <asm/rtas.h>
> -
> -/**
> - * eeh_pcid_name - Retrieve name of PCI device driver
> - * @pdev: PCI device
> - *
> - * This routine is used to retrieve the name of PCI device driver
> - * if that's valid.
> - */
> -static inline const char *eeh_pcid_name(struct pci_dev *pdev)
> -{
> -	if (pdev && pdev->dev.driver)
> -		return pdev->dev.driver->name;
> -	return "";
> -}
> -
> -/**
> - * eeh_pcid_get - Get the PCI device driver
> - * @pdev: PCI device
> - *
> - * The function is used to retrieve the PCI device driver for
> - * the indicated PCI device. Besides, we will increase the reference
> - * of the PCI device driver to prevent that being unloaded on
> - * the fly. Otherwise, kernel crash would be seen.
> - */
> -static inline struct pci_driver *eeh_pcid_get(struct pci_dev *pdev)
> -{
> -	if (!pdev || !pdev->driver)
> -		return NULL;
> -
> -	if (!try_module_get(pdev->driver->driver.owner))
> -		return NULL;
> -
> -	return pdev->driver;
> -}
> -
> -/**
> - * eeh_pcid_put - Dereference on the PCI device driver
> - * @pdev: PCI device
> - *
> - * The function is called to do dereference on the PCI device
> - * driver of the indicated PCI device.
> - */
> -static inline void eeh_pcid_put(struct pci_dev *pdev)
> -{
> -	if (!pdev || !pdev->driver)
> -		return;
> -
> -	module_put(pdev->driver->driver.owner);
> -}
> -
> -#if 0
> -static void print_device_node_tree(struct pci_dn *pdn, int dent)
> -{
> -	int i;
> -	struct device_node *pc;
> -
> -	if (!pdn)
> -		return;
> -	for (i = 0; i < dent; i++)
> -		printk(" ");
> -	printk("dn=%s mode=%x \tcfg_addr=%x pe_addr=%x \tfull=%s\n",
> -		pdn->node->name, pdn->eeh_mode, pdn->eeh_config_addr,
> -		pdn->eeh_pe_config_addr, pdn->node->full_name);
> -	dent += 3;
> -	pc = pdn->node->child;
> -	while (pc) {
> -		print_device_node_tree(PCI_DN(pc), dent);
> -		pc = pc->sibling;
> -	}
> -}
> -#endif
> -
> -/**
> - * eeh_disable_irq - Disable interrupt for the recovering device
> - * @dev: PCI device
> - *
> - * This routine must be called when reporting temporary or permanent
> - * error to the particular PCI device to disable interrupt of that
> - * device. If the device has enabled MSI or MSI-X interrupt, we needn't
> - * do real work because EEH should freeze DMA transfers for those PCI
> - * devices encountering EEH errors, which includes MSI or MSI-X.
> - */
> -static void eeh_disable_irq(struct pci_dev *dev)
> -{
> -	struct eeh_dev *edev = pci_dev_to_eeh_dev(dev);
> -
> -	/* Don't disable MSI and MSI-X interrupts. They are
> -	 * effectively disabled by the DMA Stopped state
> -	 * when an EEH error occurs.
> -	 */
> -	if (dev->msi_enabled || dev->msix_enabled)
> -		return;
> -
> -	if (!irq_has_action(dev->irq))
> -		return;
> -
> -	edev->mode |= EEH_DEV_IRQ_DISABLED;
> -	disable_irq_nosync(dev->irq);
> -}
> -
> -/**
> - * eeh_enable_irq - Enable interrupt for the recovering device
> - * @dev: PCI device
> - *
> - * This routine must be called to enable interrupt while failed
> - * device could be resumed.
> - */
> -static void eeh_enable_irq(struct pci_dev *dev)
> -{
> -	struct eeh_dev *edev = pci_dev_to_eeh_dev(dev);
> -
> -	if ((edev->mode) & EEH_DEV_IRQ_DISABLED) {
> -		edev->mode &= ~EEH_DEV_IRQ_DISABLED;
> -		enable_irq(dev->irq);
> -	}
> -}
> -
> -/**
> - * eeh_report_error - Report pci error to each device driver
> - * @data: eeh device
> - * @userdata: return value
> - *
> - * Report an EEH error to each device driver, collect up and
> - * merge the device driver responses. Cumulative response
> - * passed back in "userdata".
> - */
> -static void *eeh_report_error(void *data, void *userdata)
> -{
> -	struct eeh_dev *edev = (struct eeh_dev *)data;
> -	struct pci_dev *dev = eeh_dev_to_pci_dev(edev);
> -	enum pci_ers_result rc, *res = userdata;
> -	struct pci_driver *driver;
> -
> -	/* We might not have the associated PCI device,
> -	 * then we should continue for next one.
> -	 */
> -	if (!dev) return NULL;
> -	dev->error_state = pci_channel_io_frozen;
> -
> -	driver = eeh_pcid_get(dev);
> -	if (!driver) return NULL;
> -
> -	eeh_disable_irq(dev);
> -
> -	if (!driver->err_handler ||
> -	    !driver->err_handler->error_detected) {
> -		eeh_pcid_put(dev);
> -		return NULL;
> -	}
> -
> -	rc = driver->err_handler->error_detected(dev, pci_channel_io_frozen);
> -
> -	/* A driver that needs a reset trumps all others */
> -	if (rc == PCI_ERS_RESULT_NEED_RESET) *res = rc;
> -	if (*res == PCI_ERS_RESULT_NONE) *res = rc;
> -
> -	eeh_pcid_put(dev);
> -	return NULL;
> -}
> -
> -/**
> - * eeh_report_mmio_enabled - Tell drivers that MMIO has been enabled
> - * @data: eeh device
> - * @userdata: return value
> - *
> - * Tells each device driver that IO ports, MMIO and config space I/O
> - * are now enabled. Collects up and merges the device driver responses.
> - * Cumulative response passed back in "userdata".
> - */
> -static void *eeh_report_mmio_enabled(void *data, void *userdata)
> -{
> -	struct eeh_dev *edev = (struct eeh_dev *)data;
> -	struct pci_dev *dev = eeh_dev_to_pci_dev(edev);
> -	enum pci_ers_result rc, *res = userdata;
> -	struct pci_driver *driver;
> -
> -	driver = eeh_pcid_get(dev);
> -	if (!driver) return NULL;
> -
> -	if (!driver->err_handler ||
> -	    !driver->err_handler->mmio_enabled) {
> -		eeh_pcid_put(dev);
> -		return NULL;
> -	}
> -
> -	rc = driver->err_handler->mmio_enabled(dev);
> -
> -	/* A driver that needs a reset trumps all others */
> -	if (rc == PCI_ERS_RESULT_NEED_RESET) *res = rc;
> -	if (*res == PCI_ERS_RESULT_NONE) *res = rc;
> -
> -	eeh_pcid_put(dev);
> -	return NULL;
> -}
> -
> -/**
> - * eeh_report_reset - Tell device that slot has been reset
> - * @data: eeh device
> - * @userdata: return value
> - *
> - * This routine must be called while EEH tries to reset particular
> - * PCI device so that the associated PCI device driver could take
> - * some actions, usually to save data the driver needs so that the
> - * driver can work again while the device is recovered.
> - */
> -static void *eeh_report_reset(void *data, void *userdata)
> -{
> -	struct eeh_dev *edev = (struct eeh_dev *)data;
> -	struct pci_dev *dev = eeh_dev_to_pci_dev(edev);
> -	enum pci_ers_result rc, *res = userdata;
> -	struct pci_driver *driver;
> -
> -	if (!dev) return NULL;
> -	dev->error_state = pci_channel_io_normal;
> -
> -	driver = eeh_pcid_get(dev);
> -	if (!driver) return NULL;
> -
> -	eeh_enable_irq(dev);
> -
> -	if (!driver->err_handler ||
> -	    !driver->err_handler->slot_reset) {
> -		eeh_pcid_put(dev);
> -		return NULL;
> -	}
> -
> -	rc = driver->err_handler->slot_reset(dev);
> -	if ((*res == PCI_ERS_RESULT_NONE) ||
> -	    (*res == PCI_ERS_RESULT_RECOVERED)) *res = rc;
> -	if (*res == PCI_ERS_RESULT_DISCONNECT &&
> -	     rc == PCI_ERS_RESULT_NEED_RESET) *res = rc;
> -
> -	eeh_pcid_put(dev);
> -	return NULL;
> -}
> -
> -/**
> - * eeh_report_resume - Tell device to resume normal operations
> - * @data: eeh device
> - * @userdata: return value
> - *
> - * This routine must be called to notify the device driver that it
> - * could resume so that the device driver can do some initialization
> - * to make the recovered device work again.
> - */
> -static void *eeh_report_resume(void *data, void *userdata)
> -{
> -	struct eeh_dev *edev = (struct eeh_dev *)data;
> -	struct pci_dev *dev = eeh_dev_to_pci_dev(edev);
> -	struct pci_driver *driver;
> -
> -	if (!dev) return NULL;
> -	dev->error_state = pci_channel_io_normal;
> -
> -	driver = eeh_pcid_get(dev);
> -	if (!driver) return NULL;
> -
> -	eeh_enable_irq(dev);
> -
> -	if (!driver->err_handler ||
> -	    !driver->err_handler->resume) {
> -		eeh_pcid_put(dev);
> -		return NULL;
> -	}
> -
> -	driver->err_handler->resume(dev);
> -
> -	eeh_pcid_put(dev);
> -	return NULL;
> -}
> -
> -/**
> - * eeh_report_failure - Tell device driver that device is dead.
> - * @data: eeh device
> - * @userdata: return value
> - *
> - * This informs the device driver that the device is permanently
> - * dead, and that no further recovery attempts will be made on it.
> - */
> -static void *eeh_report_failure(void *data, void *userdata)
> -{
> -	struct eeh_dev *edev = (struct eeh_dev *)data;
> -	struct pci_dev *dev = eeh_dev_to_pci_dev(edev);
> -	struct pci_driver *driver;
> -
> -	if (!dev) return NULL;
> -	dev->error_state = pci_channel_io_perm_failure;
> -
> -	driver = eeh_pcid_get(dev);
> -	if (!driver) return NULL;
> -
> -	eeh_disable_irq(dev);
> -
> -	if (!driver->err_handler ||
> -	    !driver->err_handler->error_detected) {
> -		eeh_pcid_put(dev);
> -		return NULL;
> -	}
> -
> -	driver->err_handler->error_detected(dev, pci_channel_io_perm_failure);
> -
> -	eeh_pcid_put(dev);
> -	return NULL;
> -}
> -
> -/**
> - * eeh_reset_device - Perform actual reset of a pci slot
> - * @pe: EEH PE
> - * @bus: PCI bus corresponding to the isolcated slot
> - *
> - * This routine must be called to do reset on the indicated PE.
> - * During the reset, udev might be invoked because those affected
> - * PCI devices will be removed and then added.
> - */
> -static int eeh_reset_device(struct eeh_pe *pe, struct pci_bus *bus)
> -{
> -	int cnt, rc;
> -
> -	/* pcibios will clear the counter; save the value */
> -	cnt = pe->freeze_count;
> -
> -	/*
> -	 * We don't remove the corresponding PE instances because
> -	 * we need the information afterwords. The attached EEH
> -	 * devices are expected to be attached soon when calling
> -	 * into pcibios_add_pci_devices().
> -	 */
> -	if (bus)
> -		__pcibios_remove_pci_devices(bus, 0);
> -
> -	/* Reset the pci controller. (Asserts RST#; resets config space).
> -	 * Reconfigure bridges and devices. Don't try to bring the system
> -	 * up if the reset failed for some reason.
> -	 */
> -	rc = eeh_reset_pe(pe);
> -	if (rc)
> -		return rc;
> -
> -	/* Restore PE */
> -	eeh_ops->configure_bridge(pe);
> -	eeh_pe_restore_bars(pe);
> -
> -	/* Give the system 5 seconds to finish running the user-space
> -	 * hotplug shutdown scripts, e.g. ifdown for ethernet.  Yes,
> -	 * this is a hack, but if we don't do this, and try to bring
> -	 * the device up before the scripts have taken it down,
> -	 * potentially weird things happen.
> -	 */
> -	if (bus) {
> -		ssleep(5);
> -		pcibios_add_pci_devices(bus);
> -	}
> -	pe->freeze_count = cnt;
> -
> -	return 0;
> -}
> -
> -/* The longest amount of time to wait for a pci device
> - * to come back on line, in seconds.
> - */
> -#define MAX_WAIT_FOR_RECOVERY 150
> -
> -/**
> - * eeh_handle_event - Reset a PCI device after hard lockup.
> - * @pe: EEH PE
> - *
> - * While PHB detects address or data parity errors on particular PCI
> - * slot, the associated PE will be frozen. Besides, DMA's occurring
> - * to wild addresses (which usually happen due to bugs in device
> - * drivers or in PCI adapter firmware) can cause EEH error. #SERR,
> - * #PERR or other misc PCI-related errors also can trigger EEH errors.
> - *
> - * Recovery process consists of unplugging the device driver (which
> - * generated hotplug events to userspace), then issuing a PCI #RST to
> - * the device, then reconfiguring the PCI config space for all bridges
> - * & devices under this slot, and then finally restarting the device
> - * drivers (which cause a second set of hotplug events to go out to
> - * userspace).
> - */
> -void eeh_handle_event(struct eeh_pe *pe)
> -{
> -	struct pci_bus *frozen_bus;
> -	int rc = 0;
> -	enum pci_ers_result result = PCI_ERS_RESULT_NONE;
> -
> -	frozen_bus = eeh_pe_bus_get(pe);
> -	if (!frozen_bus) {
> -		pr_err("%s: Cannot find PCI bus for PHB#%d-PE#%x\n",
> -			__func__, pe->phb->global_number, pe->addr);
> -		return;
> -	}
> -
> -	pe->freeze_count++;
> -	if (pe->freeze_count > EEH_MAX_ALLOWED_FREEZES)
> -		goto excess_failures;
> -	pr_warning("EEH: This PCI device has failed %d times in the last hour\n",
> -		pe->freeze_count);
> -
> -	/* Walk the various device drivers attached to this slot through
> -	 * a reset sequence, giving each an opportunity to do what it needs
> -	 * to accomplish the reset.  Each child gets a report of the
> -	 * status ... if any child can't handle the reset, then the entire
> -	 * slot is dlpar removed and added.
> -	 */
> -	eeh_pe_dev_traverse(pe, eeh_report_error, &result);
> -
> -	/* Get the current PCI slot state. This can take a long time,
> -	 * sometimes over 3 seconds for certain systems.
> -	 */
> -	rc = eeh_ops->wait_state(pe, MAX_WAIT_FOR_RECOVERY*1000);
> -	if (rc < 0 || rc == EEH_STATE_NOT_SUPPORT) {
> -		printk(KERN_WARNING "EEH: Permanent failure\n");
> -		goto hard_fail;
> -	}
> -
> -	/* Since rtas may enable MMIO when posting the error log,
> -	 * don't post the error log until after all dev drivers
> -	 * have been informed.
> -	 */
> -	eeh_slot_error_detail(pe, EEH_LOG_TEMP);
> -
> -	/* If all device drivers were EEH-unaware, then shut
> -	 * down all of the device drivers, and hope they
> -	 * go down willingly, without panicing the system.
> -	 */
> -	if (result == PCI_ERS_RESULT_NONE) {
> -		rc = eeh_reset_device(pe, frozen_bus);
> -		if (rc) {
> -			printk(KERN_WARNING "EEH: Unable to reset, rc=%d\n", rc);
> -			goto hard_fail;
> -		}
> -	}
> -
> -	/* If all devices reported they can proceed, then re-enable MMIO */
> -	if (result == PCI_ERS_RESULT_CAN_RECOVER) {
> -		rc = eeh_pci_enable(pe, EEH_OPT_THAW_MMIO);
> -
> -		if (rc < 0)
> -			goto hard_fail;
> -		if (rc) {
> -			result = PCI_ERS_RESULT_NEED_RESET;
> -		} else {
> -			result = PCI_ERS_RESULT_NONE;
> -			eeh_pe_dev_traverse(pe, eeh_report_mmio_enabled, &result);
> -		}
> -	}
> -
> -	/* If all devices reported they can proceed, then re-enable DMA */
> -	if (result == PCI_ERS_RESULT_CAN_RECOVER) {
> -		rc = eeh_pci_enable(pe, EEH_OPT_THAW_DMA);
> -
> -		if (rc < 0)
> -			goto hard_fail;
> -		if (rc)
> -			result = PCI_ERS_RESULT_NEED_RESET;
> -		else
> -			result = PCI_ERS_RESULT_RECOVERED;
> -	}
> -
> -	/* If any device has a hard failure, then shut off everything. */
> -	if (result == PCI_ERS_RESULT_DISCONNECT) {
> -		printk(KERN_WARNING "EEH: Device driver gave up\n");
> -		goto hard_fail;
> -	}
> -
> -	/* If any device called out for a reset, then reset the slot */
> -	if (result == PCI_ERS_RESULT_NEED_RESET) {
> -		rc = eeh_reset_device(pe, NULL);
> -		if (rc) {
> -			printk(KERN_WARNING "EEH: Cannot reset, rc=%d\n", rc);
> -			goto hard_fail;
> -		}
> -		result = PCI_ERS_RESULT_NONE;
> -		eeh_pe_dev_traverse(pe, eeh_report_reset, &result);
> -	}
> -
> -	/* All devices should claim they have recovered by now. */
> -	if ((result != PCI_ERS_RESULT_RECOVERED) &&
> -	    (result != PCI_ERS_RESULT_NONE)) {
> -		printk(KERN_WARNING "EEH: Not recovered\n");
> -		goto hard_fail;
> -	}
> -
> -	/* Tell all device drivers that they can resume operations */
> -	eeh_pe_dev_traverse(pe, eeh_report_resume, NULL);
> -
> -	return;
> -	
> -excess_failures:
> -	/*
> -	 * About 90% of all real-life EEH failures in the field
> -	 * are due to poorly seated PCI cards. Only 10% or so are
> -	 * due to actual, failed cards.
> -	 */
> -	pr_err("EEH: PHB#%d-PE#%x has failed %d times in the\n"
> -	       "last hour and has been permanently disabled.\n"
> -	       "Please try reseating or replacing it.\n",
> -		pe->phb->global_number, pe->addr,
> -		pe->freeze_count);
> -	goto perm_error;
> -
> -hard_fail:
> -	pr_err("EEH: Unable to recover from failure from PHB#%d-PE#%x.\n"
> -	       "Please try reseating or replacing it\n",
> -		pe->phb->global_number, pe->addr);
> -
> -perm_error:
> -	eeh_slot_error_detail(pe, EEH_LOG_PERM);
> -
> -	/* Notify all devices that they're about to go down. */
> -	eeh_pe_dev_traverse(pe, eeh_report_failure, NULL);
> -
> -	/* Shut down the device drivers for good. */
> -	if (frozen_bus)
> -		pcibios_remove_pci_devices(frozen_bus);
> -}
> -
> diff --git a/arch/powerpc/platforms/pseries/eeh_event.c b/arch/powerpc/platforms/pseries/eeh_event.c
> deleted file mode 100644
> index 185bedd..0000000
> --- a/arch/powerpc/platforms/pseries/eeh_event.c
> +++ /dev/null
> @@ -1,142 +0,0 @@
> -/*
> - * This program is free software; you can redistribute it and/or modify
> - * it under the terms of the GNU General Public License as published by
> - * the Free Software Foundation; either version 2 of the License, or
> - * (at your option) any later version.
> - *
> - * This program is distributed in the hope that it will be useful,
> - * but WITHOUT ANY WARRANTY; without even the implied warranty of
> - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> - * GNU General Public License for more details.
> - *
> - * You should have received a copy of the GNU General Public License
> - * along with this program; if not, write to the Free Software
> - * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307 USA
> - *
> - * Copyright (c) 2005 Linas Vepstas <linas@linas.org>
> - */
> -
> -#include <linux/delay.h>
> -#include <linux/list.h>
> -#include <linux/mutex.h>
> -#include <linux/sched.h>
> -#include <linux/pci.h>
> -#include <linux/slab.h>
> -#include <linux/workqueue.h>
> -#include <linux/kthread.h>
> -#include <asm/eeh_event.h>
> -#include <asm/ppc-pci.h>
> -
> -/** Overview:
> - *  EEH error states may be detected within exception handlers;
> - *  however, the recovery processing needs to occur asynchronously
> - *  in a normal kernel context and not an interrupt context.
> - *  This pair of routines creates an event and queues it onto a
> - *  work-queue, where a worker thread can drive recovery.
> - */
> -
> -/* EEH event workqueue setup. */
> -static DEFINE_SPINLOCK(eeh_eventlist_lock);
> -LIST_HEAD(eeh_eventlist);
> -static void eeh_thread_launcher(struct work_struct *);
> -DECLARE_WORK(eeh_event_wq, eeh_thread_launcher);
> -
> -/* Serialize reset sequences for a given pci device */
> -DEFINE_MUTEX(eeh_event_mutex);
> -
> -/**
> - * eeh_event_handler - Dispatch EEH events.
> - * @dummy - unused
> - *
> - * The detection of a frozen slot can occur inside an interrupt,
> - * where it can be hard to do anything about it.  The goal of this
> - * routine is to pull these detection events out of the context
> - * of the interrupt handler, and re-dispatch them for processing
> - * at a later time in a normal context.
> - */
> -static int eeh_event_handler(void * dummy)
> -{
> -	unsigned long flags;
> -	struct eeh_event *event;
> -	struct eeh_pe *pe;
> -
> -	spin_lock_irqsave(&eeh_eventlist_lock, flags);
> -	event = NULL;
> -
> -	/* Unqueue the event, get ready to process. */
> -	if (!list_empty(&eeh_eventlist)) {
> -		event = list_entry(eeh_eventlist.next, struct eeh_event, list);
> -		list_del(&event->list);
> -	}
> -	spin_unlock_irqrestore(&eeh_eventlist_lock, flags);
> -
> -	if (event == NULL)
> -		return 0;
> -
> -	/* Serialize processing of EEH events */
> -	mutex_lock(&eeh_event_mutex);
> -	pe = event->pe;
> -	eeh_pe_state_mark(pe, EEH_PE_RECOVERING);
> -	pr_info("EEH: Detected PCI bus error on PHB#%d-PE#%x\n",
> -		pe->phb->global_number, pe->addr);
> -
> -	set_current_state(TASK_INTERRUPTIBLE);	/* Don't add to load average */
> -	eeh_handle_event(pe);
> -	eeh_pe_state_clear(pe, EEH_PE_RECOVERING);
> -
> -	kfree(event);
> -	mutex_unlock(&eeh_event_mutex);
> -
> -	/* If there are no new errors after an hour, clear the counter. */
> -	if (pe && pe->freeze_count > 0) {
> -		msleep_interruptible(3600*1000);
> -		if (pe->freeze_count > 0)
> -			pe->freeze_count--;
> -
> -	}
> -
> -	return 0;
> -}
> -
> -/**
> - * eeh_thread_launcher - Start kernel thread to handle EEH events
> - * @dummy - unused
> - *
> - * This routine is called to start the kernel thread for processing
> - * EEH event.
> - */
> -static void eeh_thread_launcher(struct work_struct *dummy)
> -{
> -	if (IS_ERR(kthread_run(eeh_event_handler, NULL, "eehd")))
> -		printk(KERN_ERR "Failed to start EEH daemon\n");
> -}
> -
> -/**
> - * eeh_send_failure_event - Generate a PCI error event
> - * @pe: EEH PE
> - *
> - * This routine can be called within an interrupt context;
> - * the actual event will be delivered in a normal context
> - * (from a workqueue).
> - */
> -int eeh_send_failure_event(struct eeh_pe *pe)
> -{
> -	unsigned long flags;
> -	struct eeh_event *event;
> -
> -	event = kzalloc(sizeof(*event), GFP_ATOMIC);
> -	if (!event) {
> -		pr_err("EEH: out of memory, event not handled\n");
> -		return -ENOMEM;
> -	}
> -	event->pe = pe;
> -
> -	/* We may or may not be called in an interrupt context */
> -	spin_lock_irqsave(&eeh_eventlist_lock, flags);
> -	list_add(&event->list, &eeh_eventlist);
> -	spin_unlock_irqrestore(&eeh_eventlist_lock, flags);
> -
> -	schedule_work(&eeh_event_wq);
> -
> -	return 0;
> -}
> diff --git a/arch/powerpc/platforms/pseries/eeh_pe.c b/arch/powerpc/platforms/pseries/eeh_pe.c
> deleted file mode 100644
> index 9d4a9e8..0000000
> --- a/arch/powerpc/platforms/pseries/eeh_pe.c
> +++ /dev/null
> @@ -1,653 +0,0 @@
> -/*
> - * The file intends to implement PE based on the information from
> - * platforms. Basically, there have 3 types of PEs: PHB/Bus/Device.
> - * All the PEs should be organized as hierarchy tree. The first level
> - * of the tree will be associated to existing PHBs since the particular
> - * PE is only meaningful in one PHB domain.
> - *
> - * Copyright Benjamin Herrenschmidt & Gavin Shan, IBM Corporation 2012.
> - *
> - * This program is free software; you can redistribute it and/or modify
> - * it under the terms of the GNU General Public License as published by
> - * the Free Software Foundation; either version 2 of the License, or
> - * (at your option) any later version.
> - *
> - * This program is distributed in the hope that it will be useful,
> - * but WITHOUT ANY WARRANTY; without even the implied warranty of
> - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> - * GNU General Public License for more details.
> - *
> - * You should have received a copy of the GNU General Public License
> - * along with this program; if not, write to the Free Software
> - * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307 USA
> - */
> -
> -#include <linux/export.h>
> -#include <linux/gfp.h>
> -#include <linux/init.h>
> -#include <linux/kernel.h>
> -#include <linux/pci.h>
> -#include <linux/string.h>
> -
> -#include <asm/pci-bridge.h>
> -#include <asm/ppc-pci.h>
> -
> -static LIST_HEAD(eeh_phb_pe);
> -
> -/**
> - * eeh_pe_alloc - Allocate PE
> - * @phb: PCI controller
> - * @type: PE type
> - *
> - * Allocate PE instance dynamically.
> - */
> -static struct eeh_pe *eeh_pe_alloc(struct pci_controller *phb, int type)
> -{
> -	struct eeh_pe *pe;
> -
> -	/* Allocate PHB PE */
> -	pe = kzalloc(sizeof(struct eeh_pe), GFP_KERNEL);
> -	if (!pe) return NULL;
> -
> -	/* Initialize PHB PE */
> -	pe->type = type;
> -	pe->phb = phb;
> -	INIT_LIST_HEAD(&pe->child_list);
> -	INIT_LIST_HEAD(&pe->child);
> -	INIT_LIST_HEAD(&pe->edevs);
> -
> -	return pe;
> -}
> -
> -/**
> - * eeh_phb_pe_create - Create PHB PE
> - * @phb: PCI controller
> - *
> - * The function should be called while the PHB is detected during
> - * system boot or PCI hotplug in order to create PHB PE.
> - */
> -int eeh_phb_pe_create(struct pci_controller *phb)
> -{
> -	struct eeh_pe *pe;
> -
> -	/* Allocate PHB PE */
> -	pe = eeh_pe_alloc(phb, EEH_PE_PHB);
> -	if (!pe) {
> -		pr_err("%s: out of memory!\n", __func__);
> -		return -ENOMEM;
> -	}
> -
> -	/* Put it into the list */
> -	eeh_lock();
> -	list_add_tail(&pe->child, &eeh_phb_pe);
> -	eeh_unlock();
> -
> -	pr_debug("EEH: Add PE for PHB#%d\n", phb->global_number);
> -
> -	return 0;
> -}
> -
> -/**
> - * eeh_phb_pe_get - Retrieve PHB PE based on the given PHB
> - * @phb: PCI controller
> - *
> - * The overall PEs form hierarchy tree. The first layer of the
> - * hierarchy tree is composed of PHB PEs. The function is used
> - * to retrieve the corresponding PHB PE according to the given PHB.
> - */
> -static struct eeh_pe *eeh_phb_pe_get(struct pci_controller *phb)
> -{
> -	struct eeh_pe *pe;
> -
> -	list_for_each_entry(pe, &eeh_phb_pe, child) {
> -		/*
> -		 * Actually, we needn't check the type since
> -		 * the PE for PHB has been determined when that
> -		 * was created.
> -		 */
> -		if ((pe->type & EEH_PE_PHB) && pe->phb == phb)
> -			return pe;
> -	}
> -
> -	return NULL;
> -}
> -
> -/**
> - * eeh_pe_next - Retrieve the next PE in the tree
> - * @pe: current PE
> - * @root: root PE
> - *
> - * The function is used to retrieve the next PE in the
> - * hierarchy PE tree.
> - */
> -static struct eeh_pe *eeh_pe_next(struct eeh_pe *pe,
> -				  struct eeh_pe *root)
> -{
> -	struct list_head *next = pe->child_list.next;
> -
> -	if (next == &pe->child_list) {
> -		while (1) {
> -			if (pe == root)
> -				return NULL;
> -			next = pe->child.next;
> -			if (next != &pe->parent->child_list)
> -				break;
> -			pe = pe->parent;
> -		}
> -	}
> -
> -	return list_entry(next, struct eeh_pe, child);
> -}
> -
> -/**
> - * eeh_pe_traverse - Traverse PEs in the specified PHB
> - * @root: root PE
> - * @fn: callback
> - * @flag: extra parameter to callback
> - *
> - * The function is used to traverse the specified PE and its
> - * child PEs. The traversing is to be terminated once the
> - * callback returns something other than NULL, or no more PEs
> - * to be traversed.
> - */
> -static void *eeh_pe_traverse(struct eeh_pe *root,
> -			eeh_traverse_func fn, void *flag)
> -{
> -	struct eeh_pe *pe;
> -	void *ret;
> -
> -	for (pe = root; pe; pe = eeh_pe_next(pe, root)) {
> -		ret = fn(pe, flag);
> -		if (ret) return ret;
> -	}
> -
> -	return NULL;
> -}
> -
> -/**
> - * eeh_pe_dev_traverse - Traverse the devices from the PE
> - * @root: EEH PE
> - * @fn: function callback
> - * @flag: extra parameter to callback
> - *
> - * The function is used to traverse the devices of the specified
> - * PE and its child PEs.
> - */
> -void *eeh_pe_dev_traverse(struct eeh_pe *root,
> -		eeh_traverse_func fn, void *flag)
> -{
> -	struct eeh_pe *pe;
> -	struct eeh_dev *edev;
> -	void *ret;
> -
> -	if (!root) {
> -		pr_warning("%s: Invalid PE %p\n", __func__, root);
> -		return NULL;
> -	}
> -
> -	eeh_lock();
> -
> -	/* Traverse root PE */
> -	for (pe = root; pe; pe = eeh_pe_next(pe, root)) {
> -		eeh_pe_for_each_dev(pe, edev) {
> -			ret = fn(edev, flag);
> -			if (ret) {
> -				eeh_unlock();
> -				return ret;
> -			}
> -		}
> -	}
> -
> -	eeh_unlock();
> -
> -	return NULL;
> -}
> -
> -/**
> - * __eeh_pe_get - Check the PE address
> - * @data: EEH PE
> - * @flag: EEH device
> - *
> - * For one particular PE, it can be identified by PE address
> - * or tranditional BDF address. BDF address is composed of
> - * Bus/Device/Function number. The extra data referred by flag
> - * indicates which type of address should be used.
> - */
> -static void *__eeh_pe_get(void *data, void *flag)
> -{
> -	struct eeh_pe *pe = (struct eeh_pe *)data;
> -	struct eeh_dev *edev = (struct eeh_dev *)flag;
> -
> -	/* Unexpected PHB PE */
> -	if (pe->type & EEH_PE_PHB)
> -		return NULL;
> -
> -	/* We prefer PE address */
> -	if (edev->pe_config_addr &&
> -	   (edev->pe_config_addr == pe->addr))
> -		return pe;
> -
> -	/* Try BDF address */
> -	if (edev->pe_config_addr &&
> -	   (edev->config_addr == pe->config_addr))
> -		return pe;
> -
> -	return NULL;
> -}
> -
> -/**
> - * eeh_pe_get - Search PE based on the given address
> - * @edev: EEH device
> - *
> - * Search the corresponding PE based on the specified address which
> - * is included in the eeh device. The function is used to check if
> - * the associated PE has been created against the PE address. It's
> - * notable that the PE address has 2 format: traditional PE address
> - * which is composed of PCI bus/device/function number, or unified
> - * PE address.
> - */
> -static struct eeh_pe *eeh_pe_get(struct eeh_dev *edev)
> -{
> -	struct eeh_pe *root = eeh_phb_pe_get(edev->phb);
> -	struct eeh_pe *pe;
> -
> -	pe = eeh_pe_traverse(root, __eeh_pe_get, edev);
> -
> -	return pe;
> -}
> -
> -/**
> - * eeh_pe_get_parent - Retrieve the parent PE
> - * @edev: EEH device
> - *
> - * The whole PEs existing in the system are organized as hierarchy
> - * tree. The function is used to retrieve the parent PE according
> - * to the parent EEH device.
> - */
> -static struct eeh_pe *eeh_pe_get_parent(struct eeh_dev *edev)
> -{
> -	struct device_node *dn;
> -	struct eeh_dev *parent;
> -
> -	/*
> -	 * It might have the case for the indirect parent
> -	 * EEH device already having associated PE, but
> -	 * the direct parent EEH device doesn't have yet.
> -	 */
> -	dn = edev->dn->parent;
> -	while (dn) {
> -		/* We're poking out of PCI territory */
> -		if (!PCI_DN(dn)) return NULL;
> -
> -		parent = of_node_to_eeh_dev(dn);
> -		/* We're poking out of PCI territory */
> -		if (!parent) return NULL;
> -
> -		if (parent->pe)
> -			return parent->pe;
> -
> -		dn = dn->parent;
> -	}
> -
> -	return NULL;
> -}
> -
> -/**
> - * eeh_add_to_parent_pe - Add EEH device to parent PE
> - * @edev: EEH device
> - *
> - * Add EEH device to the parent PE. If the parent PE already
> - * exists, the PE type will be changed to EEH_PE_BUS. Otherwise,
> - * we have to create new PE to hold the EEH device and the new
> - * PE will be linked to its parent PE as well.
> - */
> -int eeh_add_to_parent_pe(struct eeh_dev *edev)
> -{
> -	struct eeh_pe *pe, *parent;
> -
> -	eeh_lock();
> -
> -	/*
> -	 * Search the PE has been existing or not according
> -	 * to the PE address. If that has been existing, the
> -	 * PE should be composed of PCI bus and its subordinate
> -	 * components.
> -	 */
> -	pe = eeh_pe_get(edev);
> -	if (pe && !(pe->type & EEH_PE_INVALID)) {
> -		if (!edev->pe_config_addr) {
> -			eeh_unlock();
> -			pr_err("%s: PE with addr 0x%x already exists\n",
> -				__func__, edev->config_addr);
> -			return -EEXIST;
> -		}
> -
> -		/* Mark the PE as type of PCI bus */
> -		pe->type = EEH_PE_BUS;
> -		edev->pe = pe;
> -
> -		/* Put the edev to PE */
> -		list_add_tail(&edev->list, &pe->edevs);
> -		eeh_unlock();
> -		pr_debug("EEH: Add %s to Bus PE#%x\n",
> -			edev->dn->full_name, pe->addr);
> -
> -		return 0;
> -	} else if (pe && (pe->type & EEH_PE_INVALID)) {
> -		list_add_tail(&edev->list, &pe->edevs);
> -		edev->pe = pe;
> -		/*
> -		 * We're running to here because of PCI hotplug caused by
> -		 * EEH recovery. We need clear EEH_PE_INVALID until the top.
> -		 */
> -		parent = pe;
> -		while (parent) {
> -			if (!(parent->type & EEH_PE_INVALID))
> -				break;
> -			parent->type &= ~EEH_PE_INVALID;
> -			parent = parent->parent;
> -		}
> -		eeh_unlock();
> -		pr_debug("EEH: Add %s to Device PE#%x, Parent PE#%x\n",
> -			edev->dn->full_name, pe->addr, pe->parent->addr);
> -
> -		return 0;
> -	}
> -
> -	/* Create a new EEH PE */
> -	pe = eeh_pe_alloc(edev->phb, EEH_PE_DEVICE);
> -	if (!pe) {
> -		eeh_unlock();
> -		pr_err("%s: out of memory!\n", __func__);
> -		return -ENOMEM;
> -	}
> -	pe->addr	= edev->pe_config_addr;
> -	pe->config_addr	= edev->config_addr;
> -
> -	/*
> -	 * Put the new EEH PE into hierarchy tree. If the parent
> -	 * can't be found, the newly created PE will be attached
> -	 * to PHB directly. Otherwise, we have to associate the
> -	 * PE with its parent.
> -	 */
> -	parent = eeh_pe_get_parent(edev);
> -	if (!parent) {
> -		parent = eeh_phb_pe_get(edev->phb);
> -		if (!parent) {
> -			eeh_unlock();
> -			pr_err("%s: No PHB PE is found (PHB Domain=%d)\n",
> -				__func__, edev->phb->global_number);
> -			edev->pe = NULL;
> -			kfree(pe);
> -			return -EEXIST;
> -		}
> -	}
> -	pe->parent = parent;
> -
> -	/*
> -	 * Put the newly created PE into the child list and
> -	 * link the EEH device accordingly.
> -	 */
> -	list_add_tail(&pe->child, &parent->child_list);
> -	list_add_tail(&edev->list, &pe->edevs);
> -	edev->pe = pe;
> -	eeh_unlock();
> -	pr_debug("EEH: Add %s to Device PE#%x, Parent PE#%x\n",
> -		edev->dn->full_name, pe->addr, pe->parent->addr);
> -
> -	return 0;
> -}
> -
> -/**
> - * eeh_rmv_from_parent_pe - Remove one EEH device from the associated PE
> - * @edev: EEH device
> - * @purge_pe: remove PE or not
> - *
> - * The PE hierarchy tree might be changed when doing PCI hotplug.
> - * Also, the PCI devices or buses could be removed from the system
> - * during EEH recovery. So we have to call the function remove the
> - * corresponding PE accordingly if necessary.
> - */
> -int eeh_rmv_from_parent_pe(struct eeh_dev *edev, int purge_pe)
> -{
> -	struct eeh_pe *pe, *parent, *child;
> -	int cnt;
> -
> -	if (!edev->pe) {
> -		pr_warning("%s: No PE found for EEH device %s\n",
> -			__func__, edev->dn->full_name);
> -		return -EEXIST;
> -	}
> -
> -	eeh_lock();
> -
> -	/* Remove the EEH device */
> -	pe = edev->pe;
> -	edev->pe = NULL;
> -	list_del(&edev->list);
> -
> -	/*
> -	 * Check if the parent PE includes any EEH devices.
> -	 * If not, we should delete that. Also, we should
> -	 * delete the parent PE if it doesn't have associated
> -	 * child PEs and EEH devices.
> -	 */
> -	while (1) {
> -		parent = pe->parent;
> -		if (pe->type & EEH_PE_PHB)
> -			break;
> -
> -		if (purge_pe) {
> -			if (list_empty(&pe->edevs) &&
> -			    list_empty(&pe->child_list)) {
> -				list_del(&pe->child);
> -				kfree(pe);
> -			} else {
> -				break;
> -			}
> -		} else {
> -			if (list_empty(&pe->edevs)) {
> -				cnt = 0;
> -				list_for_each_entry(child, &pe->child_list, child) {
> -					if (!(child->type & EEH_PE_INVALID)) {
> -						cnt++;
> -						break;
> -					}
> -				}
> -
> -				if (!cnt)
> -					pe->type |= EEH_PE_INVALID;
> -				else
> -					break;
> -			}
> -		}
> -
> -		pe = parent;
> -	}
> -
> -	eeh_unlock();
> -
> -	return 0;
> -}
> -
> -/**
> - * __eeh_pe_state_mark - Mark the state for the PE
> - * @data: EEH PE
> - * @flag: state
> - *
> - * The function is used to mark the indicated state for the given
> - * PE. Also, the associated PCI devices will be put into IO frozen
> - * state as well.
> - */
> -static void *__eeh_pe_state_mark(void *data, void *flag)
> -{
> -	struct eeh_pe *pe = (struct eeh_pe *)data;
> -	int state = *((int *)flag);
> -	struct eeh_dev *tmp;
> -	struct pci_dev *pdev;
> -
> -	/*
> -	 * Mark the PE with the indicated state. Also,
> -	 * the associated PCI device will be put into
> -	 * I/O frozen state to avoid I/O accesses from
> -	 * the PCI device driver.
> -	 */
> -	pe->state |= state;
> -	eeh_pe_for_each_dev(pe, tmp) {
> -		pdev = eeh_dev_to_pci_dev(tmp);
> -		if (pdev)
> -			pdev->error_state = pci_channel_io_frozen;
> -	}
> -
> -	return NULL;
> -}
> -
> -/**
> - * eeh_pe_state_mark - Mark specified state for PE and its associated device
> - * @pe: EEH PE
> - *
> - * EEH error affects the current PE and its child PEs. The function
> - * is used to mark appropriate state for the affected PEs and the
> - * associated devices.
> - */
> -void eeh_pe_state_mark(struct eeh_pe *pe, int state)
> -{
> -	eeh_lock();
> -	eeh_pe_traverse(pe, __eeh_pe_state_mark, &state);
> -	eeh_unlock();
> -}
> -
> -/**
> - * __eeh_pe_state_clear - Clear state for the PE
> - * @data: EEH PE
> - * @flag: state
> - *
> - * The function is used to clear the indicated state from the
> - * given PE. Besides, we also clear the check count of the PE
> - * as well.
> - */
> -static void *__eeh_pe_state_clear(void *data, void *flag)
> -{
> -	struct eeh_pe *pe = (struct eeh_pe *)data;
> -	int state = *((int *)flag);
> -
> -	pe->state &= ~state;
> -	pe->check_count = 0;
> -
> -	return NULL;
> -}
> -
> -/**
> - * eeh_pe_state_clear - Clear state for the PE and its children
> - * @pe: PE
> - * @state: state to be cleared
> - *
> - * When the PE and its children has been recovered from error,
> - * we need clear the error state for that. The function is used
> - * for the purpose.
> - */
> -void eeh_pe_state_clear(struct eeh_pe *pe, int state)
> -{
> -	eeh_lock();
> -	eeh_pe_traverse(pe, __eeh_pe_state_clear, &state);
> -	eeh_unlock();
> -}
> -
> -/**
> - * eeh_restore_one_device_bars - Restore the Base Address Registers for one device
> - * @data: EEH device
> - * @flag: Unused
> - *
> - * Loads the PCI configuration space base address registers,
> - * the expansion ROM base address, the latency timer, and etc.
> - * from the saved values in the device node.
> - */
> -static void *eeh_restore_one_device_bars(void *data, void *flag)
> -{
> -	int i;
> -	u32 cmd;
> -	struct eeh_dev *edev = (struct eeh_dev *)data;
> -	struct device_node *dn = eeh_dev_to_of_node(edev);
> -
> -	for (i = 4; i < 10; i++)
> -		eeh_ops->write_config(dn, i*4, 4, edev->config_space[i]);
> -	/* 12 == Expansion ROM Address */
> -	eeh_ops->write_config(dn, 12*4, 4, edev->config_space[12]);
> -
> -#define BYTE_SWAP(OFF) (8*((OFF)/4)+3-(OFF))
> -#define SAVED_BYTE(OFF) (((u8 *)(edev->config_space))[BYTE_SWAP(OFF)])
> -
> -	eeh_ops->write_config(dn, PCI_CACHE_LINE_SIZE, 1,
> -		SAVED_BYTE(PCI_CACHE_LINE_SIZE));
> -	eeh_ops->write_config(dn, PCI_LATENCY_TIMER, 1,
> -		SAVED_BYTE(PCI_LATENCY_TIMER));
> -
> -	/* max latency, min grant, interrupt pin and line */
> -	eeh_ops->write_config(dn, 15*4, 4, edev->config_space[15]);
> -
> -	/*
> -	 * Restore PERR & SERR bits, some devices require it,
> -	 * don't touch the other command bits
> -	 */
> -	eeh_ops->read_config(dn, PCI_COMMAND, 4, &cmd);
> -	if (edev->config_space[1] & PCI_COMMAND_PARITY)
> -		cmd |= PCI_COMMAND_PARITY;
> -	else
> -		cmd &= ~PCI_COMMAND_PARITY;
> -	if (edev->config_space[1] & PCI_COMMAND_SERR)
> -		cmd |= PCI_COMMAND_SERR;
> -	else
> -		cmd &= ~PCI_COMMAND_SERR;
> -	eeh_ops->write_config(dn, PCI_COMMAND, 4, cmd);
> -
> -	return NULL;
> -}
> -
> -/**
> - * eeh_pe_restore_bars - Restore the PCI config space info
> - * @pe: EEH PE
> - *
> - * This routine performs a recursive walk to the children
> - * of this device as well.
> - */
> -void eeh_pe_restore_bars(struct eeh_pe *pe)
> -{
> -	/*
> -	 * We needn't take the EEH lock since eeh_pe_dev_traverse()
> -	 * will take that.
> -	 */
> -	eeh_pe_dev_traverse(pe, eeh_restore_one_device_bars, NULL);
> -}
> -
> -/**
> - * eeh_pe_bus_get - Retrieve PCI bus according to the given PE
> - * @pe: EEH PE
> - *
> - * Retrieve the PCI bus according to the given PE. Basically,
> - * there're 3 types of PEs: PHB/Bus/Device. For PHB PE, the
> - * primary PCI bus will be retrieved. The parent bus will be
> - * returned for BUS PE. However, we don't have associated PCI
> - * bus for DEVICE PE.
> - */
> -struct pci_bus *eeh_pe_bus_get(struct eeh_pe *pe)
> -{
> -	struct pci_bus *bus = NULL;
> -	struct eeh_dev *edev;
> -	struct pci_dev *pdev;
> -
> -	eeh_lock();
> -
> -	if (pe->type & EEH_PE_PHB) {
> -		bus = pe->phb->bus;
> -	} else if (pe->type & EEH_PE_BUS ||
> -		   pe->type & EEH_PE_DEVICE) {
> -		edev = list_first_entry(&pe->edevs, struct eeh_dev, list);
> -		pdev = eeh_dev_to_pci_dev(edev);
> -		if (pdev)
> -			bus = pdev->bus;
> -	}
> -
> -	eeh_unlock();
> -
> -	return bus;
> -}
> diff --git a/arch/powerpc/platforms/pseries/eeh_sysfs.c b/arch/powerpc/platforms/pseries/eeh_sysfs.c
> deleted file mode 100644
> index d377083..0000000
> --- a/arch/powerpc/platforms/pseries/eeh_sysfs.c
> +++ /dev/null
> @@ -1,75 +0,0 @@
> -/*
> - * Sysfs entries for PCI Error Recovery for PAPR-compliant platform.
> - * Copyright IBM Corporation 2007
> - * Copyright Linas Vepstas <linas@austin.ibm.com> 2007
> - *
> - * All rights reserved.
> - *
> - * This program is free software; you can redistribute it and/or modify
> - * it under the terms of the GNU General Public License as published by
> - * the Free Software Foundation; either version 2 of the License, or (at
> - * your option) any later version.
> - *
> - * This program is distributed in the hope that it will be useful, but
> - * WITHOUT ANY WARRANTY; without even the implied warranty of
> - * MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE, GOOD TITLE or
> - * NON INFRINGEMENT.  See the GNU General Public License for more
> - * details.
> - *
> - * You should have received a copy of the GNU General Public License
> - * along with this program; if not, write to the Free Software
> - * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
> - *
> - * Send comments and feedback to Linas Vepstas <linas@austin.ibm.com>
> - */
> -#include <linux/pci.h>
> -#include <linux/stat.h>
> -#include <asm/ppc-pci.h>
> -#include <asm/pci-bridge.h>
> -
> -/**
> - * EEH_SHOW_ATTR -- Create sysfs entry for eeh statistic
> - * @_name: name of file in sysfs directory
> - * @_memb: name of member in struct pci_dn to access
> - * @_format: printf format for display
> - *
> - * All of the attributes look very similar, so just
> - * auto-gen a cut-n-paste routine to display them.
> - */
> -#define EEH_SHOW_ATTR(_name,_memb,_format)               \
> -static ssize_t eeh_show_##_name(struct device *dev,      \
> -		struct device_attribute *attr, char *buf)          \
> -{                                                        \
> -	struct pci_dev *pdev = to_pci_dev(dev);               \
> -	struct eeh_dev *edev = pci_dev_to_eeh_dev(pdev);      \
> -	                                                      \
> -	if (!edev)                                            \
> -		return 0;                                     \
> -	                                                      \
> -	return sprintf(buf, _format "\n", edev->_memb);       \
> -}                                                        \
> -static DEVICE_ATTR(_name, S_IRUGO, eeh_show_##_name, NULL);
> -
> -EEH_SHOW_ATTR(eeh_mode,            mode,            "0x%x");
> -EEH_SHOW_ATTR(eeh_config_addr,     config_addr,     "0x%x");
> -EEH_SHOW_ATTR(eeh_pe_config_addr,  pe_config_addr,  "0x%x");
> -
> -void eeh_sysfs_add_device(struct pci_dev *pdev)
> -{
> -	int rc=0;
> -
> -	rc += device_create_file(&pdev->dev, &dev_attr_eeh_mode);
> -	rc += device_create_file(&pdev->dev, &dev_attr_eeh_config_addr);
> -	rc += device_create_file(&pdev->dev, &dev_attr_eeh_pe_config_addr);
> -
> -	if (rc)
> -		printk(KERN_WARNING "EEH: Unable to create sysfs entries\n");
> -}
> -
> -void eeh_sysfs_remove_device(struct pci_dev *pdev)
> -{
> -	device_remove_file(&pdev->dev, &dev_attr_eeh_mode);
> -	device_remove_file(&pdev->dev, &dev_attr_eeh_config_addr);
> -	device_remove_file(&pdev->dev, &dev_attr_eeh_pe_config_addr);
> -}
> -
> diff --git a/arch/powerpc/platforms/pseries/pci_dlpar.c b/arch/powerpc/platforms/pseries/pci_dlpar.c
> index c91b22b..efe6137 100644
> --- a/arch/powerpc/platforms/pseries/pci_dlpar.c
> +++ b/arch/powerpc/platforms/pseries/pci_dlpar.c
> @@ -64,91 +64,6 @@ pcibios_find_pci_bus(struct device_node *dn)
>   }
>   EXPORT_SYMBOL_GPL(pcibios_find_pci_bus);
>
> -/**
> - * __pcibios_remove_pci_devices - remove all devices under this bus
> - * @bus: the indicated PCI bus
> - * @purge_pe: destroy the PE on removal of PCI devices
> - *
> - * Remove all of the PCI devices under this bus both from the
> - * linux pci device tree, and from the powerpc EEH address cache.
> - * By default, the corresponding PE will be destroied during the
> - * normal PCI hotplug path. For PCI hotplug during EEH recovery,
> - * the corresponding PE won't be destroied and deallocated.
> - */
> -void __pcibios_remove_pci_devices(struct pci_bus *bus, int purge_pe)
> -{
> -	struct pci_dev *dev, *tmp;
> -	struct pci_bus *child_bus;
> -
> -	/* First go down child busses */
> -	list_for_each_entry(child_bus, &bus->children, node)
> -		__pcibios_remove_pci_devices(child_bus, purge_pe);
> -
> -	pr_debug("PCI: Removing devices on bus %04x:%02x\n",
> -		pci_domain_nr(bus),  bus->number);
> -	list_for_each_entry_safe(dev, tmp, &bus->devices, bus_list) {
> -		pr_debug("     * Removing %s...\n", pci_name(dev));
> -		eeh_remove_bus_device(dev, purge_pe);
> -		pci_stop_and_remove_bus_device(dev);
> -	}
> -}
> -
> -/**
> - * pcibios_remove_pci_devices - remove all devices under this bus
> - *
> - * Remove all of the PCI devices under this bus both from the
> - * linux pci device tree, and from the powerpc EEH address cache.
> - */
> -void pcibios_remove_pci_devices(struct pci_bus *bus)
> -{
> -	__pcibios_remove_pci_devices(bus, 1);
> -}
> -EXPORT_SYMBOL_GPL(pcibios_remove_pci_devices);
> -
> -/**
> - * pcibios_add_pci_devices - adds new pci devices to bus
> - *
> - * This routine will find and fixup new pci devices under
> - * the indicated bus. This routine presumes that there
> - * might already be some devices under this bridge, so
> - * it carefully tries to add only new devices.  (And that
> - * is how this routine differs from other, similar pcibios
> - * routines.)
> - */
> -void pcibios_add_pci_devices(struct pci_bus * bus)
> -{
> -	int slotno, num, mode, pass, max;
> -	struct pci_dev *dev;
> -	struct device_node *dn = pci_bus_to_OF_node(bus);
> -
> -	eeh_add_device_tree_early(dn);
> -
> -	mode = PCI_PROBE_NORMAL;
> -	if (ppc_md.pci_probe_mode)
> -		mode = ppc_md.pci_probe_mode(bus);
> -
> -	if (mode == PCI_PROBE_DEVTREE) {
> -		/* use ofdt-based probe */
> -		of_rescan_bus(dn, bus);
> -	} else if (mode == PCI_PROBE_NORMAL) {
> -		/* use legacy probe */
> -		slotno = PCI_SLOT(PCI_DN(dn->child)->devfn);
> -		num = pci_scan_slot(bus, PCI_DEVFN(slotno, 0));
> -		if (!num)
> -			return;
> -		pcibios_setup_bus_devices(bus);
> -		max = bus->busn_res.start;
> -		for (pass=0; pass < 2; pass++)
> -			list_for_each_entry(dev, &bus->devices, bus_list) {
> -			if (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE ||
> -			    dev->hdr_type == PCI_HEADER_TYPE_CARDBUS)
> -				max = pci_scan_bridge(bus, dev, max, pass);
> -		}
> -	}
> -	pcibios_finish_adding_to_bus(bus);
> -}
> -EXPORT_SYMBOL_GPL(pcibios_add_pci_devices);
> -
>   struct pci_controller *init_phb_dynamic(struct device_node *dn)
>   {
>   	struct pci_controller *phb;

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 01/27] powerpc/eeh: Move common part to kernel directory
  2013-06-17  3:03   ` Mike Qiu
@ 2013-06-18  0:55     ` Gavin Shan
  0 siblings, 0 replies; 34+ messages in thread
From: Gavin Shan @ 2013-06-18  0:55 UTC (permalink / raw)
  To: Mike Qiu; +Cc: linuxppc-dev, Gavin Shan

On Mon, Jun 17, 2013 at 11:03:19AM +0800, Mike Qiu wrote:
>=E4=BA=8E 2013/6/15 17:02, Gavin Shan =E5=86=99=E9=81=93:

.../...

>>+
>>+	/* Gather bridge-specific registers */
>>+	if (dev->class >> 16 =3D=3D PCI_BASE_CLASS_BRIDGE) {
>>+		eeh_ops->read_config(dn, PCI_SEC_STATUS, 2, &cfg);
>>+		n +=3D scnprintf(buf+n, len-n, "sec stat:%x\n", cfg);
>>+		printk(KERN_WARNING "EEH: Bridge secondary status: %04x\n", cfg);
>>+
>>+		eeh_ops->read_config(dn, PCI_BRIDGE_CONTROL, 2, &cfg);
>>+		n +=3D scnprintf(buf+n, len-n, "brdg ctl:%x\n", cfg);
>>+		printk(KERN_WARNING "EEH: Bridge control: %04x\n", cfg);
>>+	}
>>+
>>+	/* Dump out the PCI-X command and status regs */
>>+	cap =3D pci_find_capability(dev, PCI_CAP_ID_PCIX);
>BTW, when move common part , here you could use dev->pcie_cap at your
>convenience, and pcie_cap has
>been initialized in of_create_pci_dev--->set_pcie_port_type


Thanks, Mike. It's not safe enough to use the cached capability
offsets, which might be invalid when we running into here. However,
we probably use following code in future, but not now :-)

It would save some PCI-CFG access.

	if (dev->pcie_cap)
		cap =3D dev->pcie_cap;
	else
		cap =3D pci_find_capability(dev, PCI_CAP_ID_PCIX);=20

Thanks,
Gavin=20

^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2013-06-18  0:55 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-06-15  9:02 [PATCH v4 00/27] EEH Support for PowerNV platform Gavin Shan
2013-06-15  9:02 ` [PATCH 01/27] powerpc/eeh: Move common part to kernel directory Gavin Shan
2013-06-17  3:03   ` Mike Qiu
2013-06-18  0:55     ` Gavin Shan
2013-06-15  9:02 ` [PATCH 02/27] powerpc/eeh: Cleanup for EEH core Gavin Shan
2013-06-15  9:02 ` [PATCH 03/27] powerpc/eeh: Make eeh_phb_pe_get() public Gavin Shan
2013-06-15  9:02 ` [PATCH 04/27] powerpc/eeh: Make eeh_pe_get() public Gavin Shan
2013-06-15  9:02 ` [PATCH 05/27] powerpc/eeh: Trace PCI bus from PE Gavin Shan
2013-06-15  9:02 ` [PATCH 06/27] powerpc/eeh: Make eeh_init() public Gavin Shan
2013-06-15  9:02 ` [PATCH 07/27] powerpc/eeh: EEH post initialization operation Gavin Shan
2013-06-15  9:02 ` [PATCH 08/27] powerpc/eeh: Refactor eeh_reset_pe_once() Gavin Shan
2013-06-15  9:03 ` [PATCH 09/27] powerpc/eeh: Delay EEH probe during hotplug Gavin Shan
2013-06-15  9:03 ` [PATCH 10/27] powerpc/eeh: Export confirm_error_lock Gavin Shan
2013-06-15  9:03 ` [PATCH 11/27] powerpc/eeh: Sync OPAL API with firmware Gavin Shan
2013-06-15  9:03 ` [PATCH 12/27] powerpc/eeh: EEH backend for P7IOC Gavin Shan
2013-06-15  9:03 ` [PATCH 13/27] powerpc/eeh: I/O chip post initialization Gavin Shan
2013-06-15  9:03 ` [PATCH 14/27] powerpc/eeh: I/O chip EEH enable option Gavin Shan
2013-06-15  9:03 ` [PATCH 15/27] powerpc/eeh: I/O chip EEH state retrieval Gavin Shan
2013-06-15  9:03 ` [PATCH 16/27] powerpc/eeh: I/O chip PE reset Gavin Shan
2013-06-15  9:03 ` [PATCH 17/27] powerpc/eeh: I/O chip PE log and bridge setup Gavin Shan
2013-06-15  9:03 ` [PATCH 18/27] powerpc/eeh: PowerNV EEH backends Gavin Shan
2013-06-15  9:03 ` [PATCH 19/27] powerpc/eeh: Initialization for PowerNV Gavin Shan
2013-06-15  9:03 ` [PATCH 20/27] powerpc/eeh: Enable EEH check for config access Gavin Shan
2013-06-15  9:03 ` [PATCH 21/27] powerpc/eeh: Process interrupts caused by EEH Gavin Shan
2013-06-16  5:12   ` Benjamin Herrenschmidt
2013-06-16  7:27     ` Gavin Shan
2013-06-16  8:37       ` Benjamin Herrenschmidt
2013-06-15  9:03 ` [PATCH 22/27] powerpc/eeh: Allow to check fenced PHB proactively Gavin Shan
2013-06-15  9:03 ` [PATCH 23/27] powernv/opal: Notifier for OPAL events Gavin Shan
2013-06-15  9:03 ` [PATCH 24/27] powernv/opal: Disable OPAL notifier upon poweroff Gavin Shan
2013-06-15  9:03 ` [PATCH 25/27] powerpc/eeh: Register OPAL notifier for PCI error Gavin Shan
2013-06-15  9:03 ` [PATCH 26/27] powerpc/powernv: Debugfs directory for PHB Gavin Shan
2013-06-15  9:03 ` [PATCH 27/27] powerpc/eeh: Debugfs for error injection Gavin Shan
  -- strict thread matches above, loose matches on Subject: below --
2013-06-05  7:34 [PATCH v3 00/27] EEH Support for PowerNV platform Gavin Shan
2013-06-05  7:34 ` [PATCH 25/27] powerpc/eeh: Register OPAL notifier for PCI error Gavin Shan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).