From: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
To: Linux Kernel list <linux-kernel@vger.kernel.org>,
linux-ia64@vger.kernel.org, "Luck, Tony" <tony.luck@intel.com>
Cc: Linas Vepstas <linas@austin.ibm.com>,
Benjamin Herrenschmidt <benh@kernel.crashing.org>,
long <tlnguyen@snoqualmie.dp.intel.com>,
linux-pci@atrey.karlin.mff.cuni.cz,
linuxppc64-dev <linuxppc64-dev@ozlabs.org>
Subject: [PATCH 2.6.13-rc1 08/10] IOCHK interface for I/O error handling/detecting
Date: Wed, 06 Jul 2005 05:18:53 +0000 [thread overview]
Message-ID: <42CB69BD.1090607@jp.fujitsu.com> (raw)
In-Reply-To: <42CB63B2.6000505@jp.fujitsu.com>
[This is 8 of 10 patches, "iochk-08-mcadrv.patch"]
- Touching poisoned data become a MCA, so now it assumed as
a fatal error, directly will be a system down. But since
the MCA tells us a physical address - "where it happens",
we can do some action to survive.
If the address is present in resource of "check-in" device,
it is guaranteed that its driver will call iochk_read in
the very near future, and that now the driver have a
ability and responsibility of recovery from the error.
So if it was "check-in" address, what OS should do is mark
"check-in" devices and just restart usual works. Soon
the driver will notice the error and operate it properly.
Note:
We can identify a affected device, but because of SAL
behavior (mentioned at 6 of 10), we need to mark all
"check-in" devices. Fix in future, if possible.
Changes from previous one for 2.6.11.11:
- (non)
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
---
arch/ia64/kernel/mca_drv.c | 84 ++++++++++++++++++++++++++++++++++++++++++++
arch/ia64/lib/iomap_check.c | 1
2 files changed, 85 insertions(+)
Index: linux-2.6.13-rc1/arch/ia64/lib/iomap_check.c
=================================--- linux-2.6.13-rc1.orig/arch/ia64/lib/iomap_check.c
+++ linux-2.6.13-rc1/arch/ia64/lib/iomap_check.c
@@ -147,3 +147,4 @@ void clear_bridge_error(struct pci_dev *
EXPORT_SYMBOL(iochk_read);
EXPORT_SYMBOL(iochk_clear);
+EXPORT_SYMBOL(iochk_devices); /* for MCA driver */
Index: linux-2.6.13-rc1/arch/ia64/kernel/mca_drv.c
=================================--- linux-2.6.13-rc1.orig/arch/ia64/kernel/mca_drv.c
+++ linux-2.6.13-rc1/arch/ia64/kernel/mca_drv.c
@@ -35,6 +35,12 @@
#include "mca_drv.h"
+#ifdef CONFIG_IOMAP_CHECK
+#include <linux/pci.h>
+#include <asm/io.h>
+extern struct list_head iochk_devices;
+#endif
+
/* max size of SAL error record (default) */
static int sal_rec_max = 10000;
@@ -377,6 +383,79 @@ is_mca_global(peidx_table_t *peidx, pal_
return MCA_IS_GLOBAL;
}
+#ifdef CONFIG_IOMAP_CHECK
+
+/**
+ * get_target_identifier - get address of target_identifier
+ * @peidx: pointer of index of processor error section
+ *
+ * Return value:
+ * addr if valid / 0 if not valid
+ */
+static u64 get_target_identifier(peidx_table_t *peidx)
+{
+ sal_log_mod_error_info_t *smei;
+
+ smei = peidx_bus_check(peidx, 0);
+ if (smei->valid.target_identifier)
+ return (smei->target_identifier);
+ return 0;
+}
+
+/**
+ * offending_addr_in_check - Check if the addr is in checking resource.
+ * @addr: address offending this MCA
+ *
+ * Return value:
+ * 1 if in / 0 if out
+ */
+static int offending_addr_in_check(u64 addr)
+{
+ int i;
+ struct pci_dev *tdev;
+ iocookie *cookie;
+
+ if (list_empty(&iochk_devices))
+ return 0;
+
+ list_for_each_entry(cookie, &iochk_devices, list) {
+ tdev = cookie->dev;
+ for (i = 0; i < PCI_ROM_RESOURCE; i++) {
+ if (tdev->resource[i].start <= addr
+ && addr <= tdev->resource[i].end)
+ return 1;
+ if ((tdev->resource[i].flags
+ & (PCI_BASE_ADDRESS_SPACE|PCI_BASE_ADDRESS_MEM_TYPE_MASK))
+ = (PCI_BASE_ADDRESS_SPACE_MEMORY|PCI_BASE_ADDRESS_MEM_TYPE_64))
+ i++;
+ }
+ }
+ return 0;
+}
+
+/**
+ * pci_error_recovery - Check if MCA occur on transaction in iochk.
+ * @peidx: pointer of index of processor error section
+ *
+ * Return value:
+ * 1 if error could be cought in driver / 0 if not
+ */
+static int pci_error_recovery(peidx_table_t *peidx)
+{
+ u64 addr;
+
+ addr = get_target_identifier(peidx);
+ if (!addr)
+ return 0;
+
+ if (offending_addr_in_check(addr))
+ return 1;
+
+ return 0;
+}
+
+#endif /* CONFIG_IOMAP_CHECK */
+
/**
* recover_from_read_error - Try to recover the errors which type are "read"s.
* @slidx: pointer of index of SAL error record
@@ -399,6 +478,11 @@ recover_from_read_error(slidx_table_t *s
if (!pbci->tv)
return 0;
+#ifdef CONFIG_IOMAP_CHECK
+ if (pci_error_recovery(peidx))
+ return 1;
+#endif
+
/*
* cpu read or memory-mapped io read
*
WARNING: multiple messages have this Message-ID (diff)
From: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
To: Linux Kernel list <linux-kernel@vger.kernel.org>,
linux-ia64@vger.kernel.org, "Luck, Tony" <tony.luck@intel.com>
Cc: Linas Vepstas <linas@austin.ibm.com>,
Benjamin Herrenschmidt <benh@kernel.crashing.org>,
long <tlnguyen@snoqualmie.dp.intel.com>,
linux-pci@atrey.karlin.mff.cuni.cz,
linuxppc64-dev <linuxppc64-dev@ozlabs.org>
Subject: [PATCH 2.6.13-rc1 08/10] IOCHK interface for I/O error handling/detecting
Date: Wed, 06 Jul 2005 14:18:53 +0900 [thread overview]
Message-ID: <42CB69BD.1090607@jp.fujitsu.com> (raw)
In-Reply-To: <42CB63B2.6000505@jp.fujitsu.com>
[This is 8 of 10 patches, "iochk-08-mcadrv.patch"]
- Touching poisoned data become a MCA, so now it assumed as
a fatal error, directly will be a system down. But since
the MCA tells us a physical address - "where it happens",
we can do some action to survive.
If the address is present in resource of "check-in" device,
it is guaranteed that its driver will call iochk_read in
the very near future, and that now the driver have a
ability and responsibility of recovery from the error.
So if it was "check-in" address, what OS should do is mark
"check-in" devices and just restart usual works. Soon
the driver will notice the error and operate it properly.
Note:
We can identify a affected device, but because of SAL
behavior (mentioned at 6 of 10), we need to mark all
"check-in" devices. Fix in future, if possible.
Changes from previous one for 2.6.11.11:
- (non)
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
---
arch/ia64/kernel/mca_drv.c | 84 ++++++++++++++++++++++++++++++++++++++++++++
arch/ia64/lib/iomap_check.c | 1
2 files changed, 85 insertions(+)
Index: linux-2.6.13-rc1/arch/ia64/lib/iomap_check.c
===================================================================
--- linux-2.6.13-rc1.orig/arch/ia64/lib/iomap_check.c
+++ linux-2.6.13-rc1/arch/ia64/lib/iomap_check.c
@@ -147,3 +147,4 @@ void clear_bridge_error(struct pci_dev *
EXPORT_SYMBOL(iochk_read);
EXPORT_SYMBOL(iochk_clear);
+EXPORT_SYMBOL(iochk_devices); /* for MCA driver */
Index: linux-2.6.13-rc1/arch/ia64/kernel/mca_drv.c
===================================================================
--- linux-2.6.13-rc1.orig/arch/ia64/kernel/mca_drv.c
+++ linux-2.6.13-rc1/arch/ia64/kernel/mca_drv.c
@@ -35,6 +35,12 @@
#include "mca_drv.h"
+#ifdef CONFIG_IOMAP_CHECK
+#include <linux/pci.h>
+#include <asm/io.h>
+extern struct list_head iochk_devices;
+#endif
+
/* max size of SAL error record (default) */
static int sal_rec_max = 10000;
@@ -377,6 +383,79 @@ is_mca_global(peidx_table_t *peidx, pal_
return MCA_IS_GLOBAL;
}
+#ifdef CONFIG_IOMAP_CHECK
+
+/**
+ * get_target_identifier - get address of target_identifier
+ * @peidx: pointer of index of processor error section
+ *
+ * Return value:
+ * addr if valid / 0 if not valid
+ */
+static u64 get_target_identifier(peidx_table_t *peidx)
+{
+ sal_log_mod_error_info_t *smei;
+
+ smei = peidx_bus_check(peidx, 0);
+ if (smei->valid.target_identifier)
+ return (smei->target_identifier);
+ return 0;
+}
+
+/**
+ * offending_addr_in_check - Check if the addr is in checking resource.
+ * @addr: address offending this MCA
+ *
+ * Return value:
+ * 1 if in / 0 if out
+ */
+static int offending_addr_in_check(u64 addr)
+{
+ int i;
+ struct pci_dev *tdev;
+ iocookie *cookie;
+
+ if (list_empty(&iochk_devices))
+ return 0;
+
+ list_for_each_entry(cookie, &iochk_devices, list) {
+ tdev = cookie->dev;
+ for (i = 0; i < PCI_ROM_RESOURCE; i++) {
+ if (tdev->resource[i].start <= addr
+ && addr <= tdev->resource[i].end)
+ return 1;
+ if ((tdev->resource[i].flags
+ & (PCI_BASE_ADDRESS_SPACE|PCI_BASE_ADDRESS_MEM_TYPE_MASK))
+ == (PCI_BASE_ADDRESS_SPACE_MEMORY|PCI_BASE_ADDRESS_MEM_TYPE_64))
+ i++;
+ }
+ }
+ return 0;
+}
+
+/**
+ * pci_error_recovery - Check if MCA occur on transaction in iochk.
+ * @peidx: pointer of index of processor error section
+ *
+ * Return value:
+ * 1 if error could be cought in driver / 0 if not
+ */
+static int pci_error_recovery(peidx_table_t *peidx)
+{
+ u64 addr;
+
+ addr = get_target_identifier(peidx);
+ if (!addr)
+ return 0;
+
+ if (offending_addr_in_check(addr))
+ return 1;
+
+ return 0;
+}
+
+#endif /* CONFIG_IOMAP_CHECK */
+
/**
* recover_from_read_error - Try to recover the errors which type are "read"s.
* @slidx: pointer of index of SAL error record
@@ -399,6 +478,11 @@ recover_from_read_error(slidx_table_t *s
if (!pbci->tv)
return 0;
+#ifdef CONFIG_IOMAP_CHECK
+ if (pci_error_recovery(peidx))
+ return 1;
+#endif
+
/*
* cpu read or memory-mapped io read
*
next prev parent reply other threads:[~2005-07-06 5:18 UTC|newest]
Thread overview: 58+ messages / expand[flat|nested] mbox.gz Atom feed top
2005-07-06 4:53 [PATCH 2.6.13-rc1 01/10] IOCHK interface for I/O error handling/detecting Hidetoshi Seto
2005-07-06 4:53 ` Hidetoshi Seto
2005-07-06 4:58 ` [PATCH 2.6.13-rc1 02/10] " Hidetoshi Seto
2005-07-06 5:00 ` Hidetoshi Seto
2005-07-06 5:04 ` [PATCH 2.6.13-rc1 03/10] " Hidetoshi Seto
2005-07-06 5:04 ` Hidetoshi Seto
2005-07-12 19:51 ` Linas Vepstas
2005-07-12 19:51 ` Linas Vepstas
2005-07-13 0:18 ` [PATCH 2.6.13-rc1 03/10] IOCHK interface for I/O error Benjamin Herrenschmidt
2005-07-13 0:18 ` [PATCH 2.6.13-rc1 03/10] IOCHK interface for I/O error handling/detecting Benjamin Herrenschmidt
2005-07-13 22:42 ` Linas Vepstas
2005-07-13 22:42 ` Linas Vepstas
2005-07-13 1:33 ` Hidetoshi Seto
2005-07-13 1:33 ` Hidetoshi Seto
2005-07-06 5:07 ` [PATCH 2.6.13-rc1 04/10] " Hidetoshi Seto
2005-07-06 5:07 ` Hidetoshi Seto
2005-07-06 5:11 ` [PATCH 2.6.13-rc1 05/10] " Hidetoshi Seto
2005-07-06 5:11 ` Hidetoshi Seto
2005-07-18 19:21 ` Grant Grundler
2005-07-18 19:21 ` Grant Grundler
2005-07-06 5:14 ` [PATCH 2.6.13-rc1 06/10] " Hidetoshi Seto
2005-07-06 5:14 ` Hidetoshi Seto
2005-07-06 5:17 ` [PATCH 2.6.13-rc1 07/10] " Hidetoshi Seto
2005-07-06 5:17 ` Hidetoshi Seto
2005-07-08 4:37 ` david mosberger
2005-07-08 4:37 ` david mosberger
2005-07-08 5:44 ` Hidetoshi Seto
2005-07-08 5:44 ` Hidetoshi Seto
2005-07-08 19:05 ` Luck, Tony
2005-07-08 19:23 ` david mosberger
2005-07-08 20:17 ` Luck, Tony
2005-07-11 17:51 ` Jesse Barnes
2005-07-11 18:21 ` Luck, Tony
2005-07-11 19:21 ` david mosberger
2005-07-12 21:14 ` Linas Vepstas
2005-07-12 21:14 ` Linas Vepstas
2005-07-13 1:59 ` Hidetoshi Seto
2005-07-13 2:00 ` Hidetoshi Seto
2005-07-06 5:18 ` Hidetoshi Seto [this message]
2005-07-06 5:18 ` [PATCH 2.6.13-rc1 08/10] " Hidetoshi Seto
2005-07-12 22:22 ` Linas Vepstas
2005-07-12 22:22 ` Linas Vepstas
2005-07-13 1:36 ` Hidetoshi Seto
2005-07-13 1:36 ` Hidetoshi Seto
2005-07-06 5:20 ` [PATCH 2.6.13-rc1 09/10] " Hidetoshi Seto
2005-07-06 5:20 ` Hidetoshi Seto
2005-07-06 5:21 ` [PATCH 2.6.13-rc1 10/10] " Hidetoshi Seto
2005-07-06 5:21 ` Hidetoshi Seto
2005-07-06 6:26 ` [PATCH 2.6.13-rc1 01/10] IOCHK interface for I/O error
2005-07-06 6:26 ` [PATCH 2.6.13-rc1 01/10] IOCHK interface for I/O error handling/detecting YOSHIFUJI Hideaki / 吉藤英明
2005-07-06 10:15 ` Hidetoshi Seto
2005-07-06 10:15 ` Hidetoshi Seto
2005-07-07 18:41 ` Greg KH
2005-07-07 18:41 ` Greg KH
2005-07-07 22:27 ` [PATCH 2.6.13-rc1 01/10] IOCHK interface for I/O error Benjamin Herrenschmidt
2005-07-07 22:27 ` [PATCH 2.6.13-rc1 01/10] IOCHK interface for I/O error handling/detecting Benjamin Herrenschmidt
2005-07-08 12:22 ` Hidetoshi Seto
2005-07-08 12:22 ` Hidetoshi Seto
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=42CB69BD.1090607@jp.fujitsu.com \
--to=seto.hidetoshi@jp.fujitsu.com \
--cc=benh@kernel.crashing.org \
--cc=linas@austin.ibm.com \
--cc=linux-ia64@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pci@atrey.karlin.mff.cuni.cz \
--cc=linuxppc64-dev@ozlabs.org \
--cc=tlnguyen@snoqualmie.dp.intel.com \
--cc=tony.luck@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.