All of lore.kernel.org
 help / color / mirror / Atom feed
From: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
To: linux-kernel@vger.kernel.org
Subject: [RFC&PATCH] PCI Error Recovery (readX_check)
Date: Fri, 20 Aug 2004 15:23:43 +0900	[thread overview]
Message-ID: <412598EF.6080905@jp.fujitsu.com> (raw)

[-- Attachment #1: Type: text/plain, Size: 4971 bytes --]

Hi, all

This is a request for comments about "PCI error recovery."

Some little time (six months!) ago, we had a discussion about this topic,
and at the last we came to the conclusion that "check the bridge status."
Based on this, I have reconsider the design and refine the implementation.
I'd really appreciate it if you could find a problem of the following ideas
and could feedback it to me.

-----

At first, again, my goal is:
  "Enable some possible error recovery on drivers by error notification."

Today, errors on some type of transaction such as:
  - memory mapped I/O write
  - DMA from device to memory
  - DMA from memory to device
are recoverable because the error (ex. parity error) is visible on the
status register of the device.

However, in the case of memory mapped I/O read, the result of the
transaction is not visible on the device. Whether there had a error
or not is visible on bus bridges locates upper the device.

We have to be careful since there are some platforms that host bus bridges
couldn't be used for recovery (ex. host isn't PCI device.)
If so, we have to use a PCI-to-PCI bridge just under the host bridge
instead of using the host bridge.

Put simply, we need to check the status of the "highest" bus bridge only
in the case of mmio read.

And, the status of the highest bridge is shared by some other devices,
so clearing the status from one device could affect checking from the
other device.  Therefore, to check the status of the highest bridge, we
should handle the status from out of device drivers, never in each drivers.

So what I have to consider next is the implementation of notification on
mmio read which we should check the highest bridge status.

---

Okay, let's talk about the actual implementation :-)

The requirement is:
  "Guarantee the result of mmio read on error while multiple mmio read
     by devices under same bridge is running."

To realize this, satisfy all of 1 to 3:

#1:
  We have to know which device under the same bridge doing mmio read.

Add a list "working_device" on the pci_dev struct to check the running
device.  Register the device to the highest bridge device at the beginning
of a session containing some mmio reads, and unregister the device at the
end of the session.

#2:
  Clear the bridge status when no devices under the bridge do mmio read.

Take a lock during a mmio read which could change the status, and during
a clearing the status.  Logically, rwlock is convenient.
  - Processing mmio read: read_lock
  - Clearing the status:  write_lock
To reduce more load on this lock, check the status before clear it, and
skip clearing if the status have no error.

#3:
  Send the error status to all drivers in session before clear it.

There is no way to know which device cause a error if there are multiple
devices doing mmio read.  (Using spinlock instead of rwlock is a possible
way, but clearly it would impact on I/O performance, so I ignore the way.)
Thus, the best we can do now is send the error to all concerning driver.
I mean, notify the error to all devices registered to the "working_device"
list of the highest bridge, by updating "err_status" value which newly
added to the pci_dev struct.  Note that we should take the write_lock from
the updating "err_status" to the clearing the bridge status.  Or the result
of I/O between the updating and the clearing would vanish into the night.

---

I suppose the basic mmio read of RAS-aware driver as like as following:

========================================================================
int retries = 2;

do {
     clear_pci_errors(dev);		/* clear bridge status */
     val = readX_check(dev, addr);	/* memory mapped I/O read */
     status = read_pci_errors(dev);	/* check bridge status */
} while (status && --retries > 0);
if (status)
	/* error */
========================================================================

The basic design of new-found functions are:

$1. clear_pci_errors(DEVICE)
 - find the highest(host or top PCI-to-PCI) bridge of DEVICE
 - check the status of the highest bridge, and if it indicates error(s):
   - write_lock
   - update each err_status of all devices registered to the working_device
     list of the highest bridge, by "|=" the value of the bridge status
   - clear the bridge status
   - write_unlock
 - clear the err_status of DEVICE
 - register DEVICE to the highest bridge

$2. readX_check(DEVICE, ADDR)
 - read_lock
 - I/O (read)
 - read_unlock

$3. read_pci_errors(DEVICE)
 - find the highest bridge of DEVICE
 - store the status of the highest bridge as STATUS
 - check ( STATUS | DEVICE->err_status )
 - return 1 if error (ex. Master/Target Abort, Party Error), else return 0


Note:
This time, here is no initialize for the control register of the highest
bridge.  The generic initialization could be implemented in the code,
but the values are user configurable and occasionally some bus needs
specific value, so now I don't write it yet.


Thanks,
H.Seto


[-- Attachment #2: patch-pci-rec-2.6.8.1 --]
[-- Type: text/plain, Size: 13184 bytes --]

diff -Nur /usr/src/linux-2.6.8.1/arch/i386/pci/common.c /tmp/linux-2.6.8.1/arch/i386/pci/common.c
--- /usr/src/linux-2.6.8.1/arch/i386/pci/common.c	2004-06-16 14:19:17.000000000 +0900
+++ /tmp/linux-2.6.8.1/arch/i386/pci/common.c	2004-08-20 14:05:05.662594940 +0900
@@ -118,6 +118,19 @@
 	pci_read_bridge_bases(b);
 }
 
+#ifdef CONFIG_PCI_ERROR_RECOVERY
+static void pci_i386_fixup_host_bridge_self(struct pci_bus *bus)
+{
+	struct pci_dev *pdev;
+
+	list_for_each_entry(pdev, &bus->devices, bus_list) {
+		if (pdev->class >> 8 == PCI_CLASS_BRIDGE_HOST) {
+			bus->self = pdev;
+			return;
+		}
+	}
+}
+#endif
 
 struct pci_bus * __devinit pcibios_scan_root(int busnum)
 {
@@ -132,7 +145,16 @@
 
 	printk("PCI: Probing PCI hardware (bus %02x)\n", busnum);
 
+#ifdef CONFIG_PCI_ERROR_RECOVERY
+{
+	struct pci_bus *bus = pci_scan_bus(busnum, &pci_root_ops, NULL);
+	if (bus) 
+		pci_i386_fixup_host_bridge_self(bus);
+	return bus;
+}
+#else
 	return pci_scan_bus(busnum, &pci_root_ops, NULL);
+#endif
 }
 
 extern u8 pci_cache_line_size;
diff -Nur /usr/src/linux-2.6.8.1/drivers/pci/Kconfig /tmp/linux-2.6.8.1/drivers/pci/Kconfig
--- /usr/src/linux-2.6.8.1/drivers/pci/Kconfig	2004-08-20 14:03:24.000000000 +0900
+++ /tmp/linux-2.6.8.1/drivers/pci/Kconfig	2004-08-20 14:05:05.662594940 +0900
@@ -47,3 +47,10 @@
 
 	  When in doubt, say Y.
 
+config PCI_ERROR_RECOVERY
+	bool "PCI device error recovery"
+	depends on PCI
+	---help---
+	By default, the device driver hardly recovers from PCI errors. When 
+	this feature is available, the special io interface are provided 
+	from the kernel.
diff -Nur /usr/src/linux-2.6.8.1/drivers/pci/Makefile /tmp/linux-2.6.8.1/drivers/pci/Makefile
--- /usr/src/linux-2.6.8.1/drivers/pci/Makefile	2004-08-20 14:03:24.000000000 +0900
+++ /tmp/linux-2.6.8.1/drivers/pci/Makefile	2004-08-20 14:05:05.663571502 +0900
@@ -5,6 +5,7 @@
 obj-y		+= access.o bus.o probe.o remove.o pci.o quirks.o \
 			names.o pci-driver.o search.o pci-sysfs.o
 obj-$(CONFIG_PROC_FS) += proc.o
+obj-$(CONFIG_PCI_ERROR_RECOVERY) += pci-rec.o
 
 ifndef CONFIG_SPARC64
 obj-y += setup-res.o
diff -Nur /usr/src/linux-2.6.8.1/drivers/pci/pci-rec.c /tmp/linux-2.6.8.1/drivers/pci/pci-rec.c
--- /usr/src/linux-2.6.8.1/drivers/pci/pci-rec.c	1970-01-01 09:00:00.000000000 +0900
+++ /tmp/linux-2.6.8.1/drivers/pci/pci-rec.c	2004-08-20 14:05:05.664548065 +0900
@@ -0,0 +1,222 @@
+#include <linux/config.h>
+#include <linux/module.h>
+#include <linux/types.h>
+#include <linux/kernel.h>
+#include <linux/pci.h>
+#include <linux/init.h>
+#include <linux/ioport.h>
+#include <linux/spinlock.h>
+#include <asm/pci.h>
+#include <asm/page.h>
+#include <asm/dma.h>    /* isa_dma_bridge_buggy */
+#include <asm/semaphore.h>
+#include <linux/list.h>
+#include <linux/rwsem.h>
+
+static rwlock_t pci_whole_lock;
+
+void
+pcibus_lock_init(struct pci_bus *bus)
+{
+	init_rwsem(&bus->bus_lock);
+}
+
+void 
+pcibus_lock(struct pci_bus *bus)
+{
+	down_read(&bus->bus_lock);
+}
+
+void
+pcibus_unlock(struct pci_bus *bus)
+{
+	up_read(&bus->bus_lock);
+}
+
+void
+pciwhole_lock_init(void)
+{
+	rwlock_init(&pci_whole_lock);
+}
+
+void
+pciwhole_read_lock(void)
+{
+        read_lock(&pci_whole_lock);
+}
+
+void
+pciwhole_read_unlock(void)
+{
+        read_unlock(&pci_whole_lock);
+}
+
+void
+pciwhole_write_lock(void)
+{
+	write_lock(&pci_whole_lock);
+}
+
+void
+pciwhole_write_unlock(void)
+{
+	write_unlock(&pci_whole_lock);
+}
+
+#define IF_IS_PCI_ERROR(status)                         \
+        if ((status & PCI_STATUS_REC_TARGET_ABORT)      \
+          ||(status & PCI_STATUS_REC_MASTER_ABORT)      \
+          ||(status & PCI_STATUS_DETECTED_PARITY))
+
+void 
+clear_pci_errors(struct pci_dev *dev) 
+{
+	u16 status;
+	struct pci_dev *prootdev, *pdev;
+	struct pci_bus *pbus;
+
+	/* find root bus bridge */
+	if (!dev->bus->self) return;
+	for (pbus = dev->bus; pbus->parent && pbus->parent->self; pbus = pbus->parent) ;
+	prootdev = pbus->self;
+	/* check status (host bus bridge or PCI to PCI bridge) */
+	switch (prootdev->hdr_type) {
+		case PCI_HEADER_TYPE_NORMAL: /* 0 */
+			pci_read_config_word(prootdev, PCI_STATUS, &status);
+			break;
+		case PCI_HEADER_TYPE_BRIDGE: /* 1 */
+			pci_read_config_word(prootdev, PCI_SEC_STATUS, &status);
+			break;
+		case PCI_HEADER_TYPE_CARDBUS: /* 2 */
+		default:
+			BUG();
+	}
+	/* "status" holds error status or not. */
+	IF_IS_PCI_ERROR(status) {
+		pciwhole_write_lock();
+		if (!list_empty(&prootdev->working_device)) {
+			/* apply for all devices under same root bus bridges */
+			list_for_each_entry(pdev, &prootdev->working_device, working_device) {
+				pdev->err_status |= status;
+			}
+		}
+		/* status clear */
+		switch (prootdev->hdr_type) {
+			case PCI_HEADER_TYPE_NORMAL: /* 0 */
+				pci_write_config_word(prootdev, PCI_STATUS, status);
+				break;
+			case PCI_HEADER_TYPE_BRIDGE: /* 1 */
+				pci_write_config_word(prootdev, PCI_SEC_STATUS, status);
+				break;
+			case PCI_HEADER_TYPE_CARDBUS: /* 2 */
+			default:
+				BUG();
+		}
+		pciwhole_write_unlock();
+	}
+	/* initialize error holding variables */
+	dev->err_status = 0;
+	/* register to root bus bridge for Multiple I/O */
+	pcibus_lock(pbus);
+	list_add(&dev->working_device, &prootdev->working_device);
+	pcibus_unlock(pbus);
+}
+
+int 
+read_pci_errors(struct pci_dev *dev) 
+{
+	u32	status = 0;
+	u16	raw_br_status;
+	struct pci_dev *prootdev;
+	struct pci_bus *pbus;
+
+	/* find root bus bridge */
+	if (!dev->bus->self) 
+		return 0; /* pseudo-no error */
+	for (pbus = dev->bus; pbus->parent && pbus->parent->self; 
+						pbus = pbus->parent) ;
+	prootdev = pbus->self;
+	/* unregister from root bus bridge */
+	pcibus_lock(pbus);
+	if (list_empty(&prootdev->working_device)) {
+		BUG();
+		return 0;
+	}
+	list_del(&dev->working_device);
+	pcibus_unlock(pbus);
+	/* check status (host bus bridge or PCI to PCI bridge) */
+	switch (prootdev->hdr_type) {
+		case PCI_HEADER_TYPE_NORMAL: /* 0 */
+			pci_read_config_word(prootdev, PCI_STATUS, &raw_br_status);
+			break;
+		case PCI_HEADER_TYPE_BRIDGE: /* 1 */
+			pci_read_config_word(prootdev, PCI_SEC_STATUS, &raw_br_status);
+			break;
+		case PCI_HEADER_TYPE_CARDBUS: /* 2 */
+		default:
+			BUG();
+	}
+	status |= raw_br_status;	/* current bridge status */
+	status |= dev->err_status;	/* saved bridge status+ */
+	IF_IS_PCI_ERROR(status) {	/* Target/Master Abort */
+		return 1;
+	}
+	return 0;
+}
+
+void pcidev_extra_init(struct pci_dev *dev)
+{
+	INIT_LIST_HEAD(&dev->working_device);
+	dev->err_status = 0;
+}
+
+u8
+readb_check(struct pci_dev *dev, u8 *addr)
+{
+	u8 val;
+
+	pciwhole_read_lock();
+	val = _readb_check(addr);
+	pciwhole_read_unlock();
+
+	return val;
+}
+
+u16
+readw_check(struct pci_dev *dev, u16 *addr)
+{
+	u16 val;
+
+	pciwhole_read_lock();
+	val = _readw_check(addr);
+	pciwhole_read_unlock();
+
+	return val;
+}
+
+u32
+readl_check(struct pci_dev *dev, u32 *addr)
+{
+	u32 val;
+
+	pciwhole_read_lock();
+	val = _readl_check(addr);
+	pciwhole_read_unlock();
+
+	return val;
+}
+
+EXPORT_SYMBOL (pcibus_lock_init);
+EXPORT_SYMBOL (pcibus_lock);
+EXPORT_SYMBOL (pcibus_unlock);
+EXPORT_SYMBOL (pciwhole_lock_init);
+EXPORT_SYMBOL (pciwhole_read_lock);
+EXPORT_SYMBOL (pciwhole_read_unlock);
+EXPORT_SYMBOL (pciwhole_write_lock);
+EXPORT_SYMBOL (pciwhole_write_unlock);
+EXPORT_SYMBOL (pcidev_extra_init);
+EXPORT_SYMBOL (clear_pci_errors);
+EXPORT_SYMBOL (read_pci_errors);
+EXPORT_SYMBOL (readb_check);
+EXPORT_SYMBOL (readw_check);
+EXPORT_SYMBOL (readl_check);
diff -Nur /usr/src/linux-2.6.8.1/drivers/pci/pci.c /tmp/linux-2.6.8.1/drivers/pci/pci.c
--- /usr/src/linux-2.6.8.1/drivers/pci/pci.c	2004-08-20 14:03:24.000000000 +0900
+++ /tmp/linux-2.6.8.1/drivers/pci/pci.c	2004-08-20 14:05:05.664548065 +0900
@@ -750,6 +750,9 @@
 	while ((dev = pci_find_device(PCI_ANY_ID, PCI_ANY_ID, dev)) != NULL) {
 		pci_fixup_device(PCI_FIXUP_FINAL, dev);
 	}
+#ifdef CONFIG_PCI_ERROR_RECOVERY
+	pciwhole_lock_init();
+#endif
 	return 0;
 }
 
diff -Nur /usr/src/linux-2.6.8.1/drivers/pci/probe.c /tmp/linux-2.6.8.1/drivers/pci/probe.c
--- /usr/src/linux-2.6.8.1/drivers/pci/probe.c	2004-08-20 14:03:24.000000000 +0900
+++ /tmp/linux-2.6.8.1/drivers/pci/probe.c	2004-08-20 14:05:05.665524627 +0900
@@ -270,6 +270,9 @@
 		INIT_LIST_HEAD(&b->node);
 		INIT_LIST_HEAD(&b->children);
 		INIT_LIST_HEAD(&b->devices);
+#ifdef CONFIG_PCI_ERROR_RECOVERY
+		pcibus_lock_init(b);
+#endif
 	}
 	return b;
 }
@@ -624,6 +627,9 @@
 
 	dev->dev.dma_mask = &dev->dma_mask;
 	dev->dev.coherent_dma_mask = 0xffffffffull;
+#ifdef CONFIG_PCI_ERROR_RECOVERY
+	pcidev_extra_init(dev);
+#endif
 
 	return dev;
 }
diff -Nur /usr/src/linux-2.6.8.1/include/asm-i386/io.h /tmp/linux-2.6.8.1/include/asm-i386/io.h
--- /usr/src/linux-2.6.8.1/include/asm-i386/io.h	2004-06-16 14:20:26.000000000 +0900
+++ /tmp/linux-2.6.8.1/include/asm-i386/io.h	2004-08-20 14:05:05.665524627 +0900
@@ -364,4 +364,19 @@
 BUILDIO(w,w,short)
 BUILDIO(l,,int)
 
+static inline unsigned char _readb_check(unsigned char *addr)
+{
+	return readb(addr);
+} 
+
+static inline unsigned short _readw_check(unsigned short *addr)
+{
+	return readw(addr);
+} 
+
+static inline unsigned int _readl_check(unsigned int *addr)
+{
+	return readl(addr);
+} 
+
 #endif
diff -Nur /usr/src/linux-2.6.8.1/include/asm-i386/pci.h /tmp/linux-2.6.8.1/include/asm-i386/pci.h
--- /usr/src/linux-2.6.8.1/include/asm-i386/pci.h	2004-06-16 14:19:22.000000000 +0900
+++ /tmp/linux-2.6.8.1/include/asm-i386/pci.h	2004-08-20 14:05:05.665524627 +0900
@@ -99,6 +99,11 @@
 {
 }
 
+#ifdef CONFIG_PCI_ERROR_RECOVERY
+u8 readb_check(struct pci_dev *, u8*);
+u16 readw_check(struct pci_dev *, u16*);
+u32 readl_check(struct pci_dev *, u32*);
+#endif
 #endif /* __KERNEL__ */
 
 /* implement the pci_ DMA API in terms of the generic device dma_ one */
diff -Nur /usr/src/linux-2.6.8.1/include/asm-ia64/io.h /tmp/linux-2.6.8.1/include/asm-ia64/io.h
--- /usr/src/linux-2.6.8.1/include/asm-ia64/io.h	2004-06-16 14:18:57.000000000 +0900
+++ /tmp/linux-2.6.8.1/include/asm-ia64/io.h	2004-08-20 14:05:05.666501190 +0900
@@ -457,4 +457,39 @@
 #define BIO_VMERGE_BOUNDARY	(ia64_max_iommu_merge_mask + 1)
 #endif
 
+static inline unsigned char 
+_readb_check(unsignged char *addr)
+{
+	register unsigned long gr8 asm("r8");
+	unsigned char val;
+
+	val = readb(addr);
+	asm volatile ("add %0=%1,r0" : "=r"(gr8) : "r"(val));
+
+	return val;
+}
+
+static inline unsigned short 
+_readw_check(unsignged short *addr)
+{
+	register unsigned long gr8 asm("r8");
+	unsigned short val;
+
+	val = readw(addr);
+	asm volatile ("add %0=%1,r0" : "=r"(gr8) : "r"(val));
+
+	return val;
+}
+
+static inline unsigned int 
+_readl_check(unsignged int *addr)
+{
+	register unsigned long gr8 asm("r8");
+	unsigned int val;
+
+	val = readl(addr);
+	asm volatile ("add %0=%1,r0" : "=r"(gr8) : "r"(val));
+
+	return val;
+}
 #endif /* _ASM_IA64_IO_H */
diff -Nur /usr/src/linux-2.6.8.1/include/asm-ia64/pci.h /tmp/linux-2.6.8.1/include/asm-ia64/pci.h
--- /usr/src/linux-2.6.8.1/include/asm-ia64/pci.h	2004-06-16 14:20:03.000000000 +0900
+++ /tmp/linux-2.6.8.1/include/asm-ia64/pci.h	2004-08-20 14:05:05.666501190 +0900
@@ -119,6 +119,11 @@
 {
 }
 
+#ifdef CONFIG_PCI_ERROR_RECOVERY
+u8 readb_check(struct pci_dev *, u8*);
+u16 readw_check(struct pci_dev *, u16*);
+u32 readl_check(struct pci_dev *, u32*);
+#endif
 /* generic pci stuff */
 #include <asm-generic/pci.h>
 
diff -Nur /usr/src/linux-2.6.8.1/include/linux/pci.h /tmp/linux-2.6.8.1/include/linux/pci.h
--- /usr/src/linux-2.6.8.1/include/linux/pci.h	2004-08-20 14:03:25.000000000 +0900
+++ /tmp/linux-2.6.8.1/include/linux/pci.h	2004-08-20 14:09:29.082513588 +0900
@@ -486,6 +486,10 @@
 struct pci_dev {
 	struct list_head global_list;	/* node in list of all PCI devices */
 	struct list_head bus_list;	/* node in per-bus list */
+#ifdef CONFIG_PCI_ERROR_RECOVERY
+	struct list_head working_device;/* working device */
+	unsigned        err_status;     /* root bridge status and other error */
+#endif
 	struct pci_bus	*bus;		/* bus this device is on */
 	struct pci_bus	*subordinate;	/* bus this device bridges to */
 
@@ -568,6 +572,9 @@
 
 struct pci_bus {
 	struct list_head node;		/* node in list of buses */
+#ifdef CONFIG_PCI_ERROR_RECOVERY
+       struct rw_semaphore bus_lock;   /* for serialized operation under same root bridge */
+#endif
 	struct pci_bus	*parent;	/* parent bus this bridge is on */
 	struct list_head children;	/* list of child buses */
 	struct list_head devices;	/* list of devices on this bus */
@@ -1021,5 +1028,21 @@
 #define PCIPCI_VSFX		16
 #define PCIPCI_ALIMAGIK		32
 
+#ifdef CONFIG_PCI_ERROR_RECOVERY
+void pcidev_extra_init(struct pci_dev *);
+void pcibus_lock_init(struct pci_bus *);
+void pcibus_lock(struct pci_bus *);
+void pcibus_unlock(struct pci_bus *);
+void pciwhole_lock_init(void);
+void pciwhole_read_lock(void);
+void pciwhole_read_unlock(void);
+void pciwhole_write_lock(void);
+void pciwhole_write_unlock(void);
+void clear_pci_errors(struct pci_dev *);
+int read_pci_errors(struct pci_dev *);
+u8  readb_check(struct pci_dev *, u8 *);
+u16 readw_check(struct pci_dev *, u16 *);
+u32 readl_check(struct pci_dev *, u32 *);
+#endif
 #endif /* __KERNEL__ */
 #endif /* LINUX_PCI_H */

                 reply	other threads:[~2004-08-20  6:24 UTC|newest]

Thread overview: [no followups] expand[flat|nested]  mbox.gz  Atom feed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=412598EF.6080905@jp.fujitsu.com \
    --to=seto.hidetoshi@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.