public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
To: Linux Kernel list <linux-kernel@vger.kernel.org>,
	linux-ia64@vger.kernel.org, "Luck, Tony" <tony.luck@intel.com>
Cc: Linas Vepstas <linas@austin.ibm.com>,
	Benjamin Herrenschmidt <benh@kernel.crashing.org>,
	long <tlnguyen@snoqualmie.dp.intel.com>,
	linux-pci@atrey.karlin.mff.cuni.cz,
	linuxppc64-dev <linuxppc64-dev@ozlabs.org>
Subject: [PATCH 2.6.13-rc1 08/10] IOCHK interface for I/O error handling/detecting
Date: Wed, 06 Jul 2005 14:18:53 +0900	[thread overview]
Message-ID: <42CB69BD.1090607@jp.fujitsu.com> (raw)
In-Reply-To: <42CB63B2.6000505@jp.fujitsu.com>

[This is 8 of 10 patches, "iochk-08-mcadrv.patch"]

- Touching poisoned data become a MCA, so now it assumed as
   a fatal error, directly will be a system down. But since
   the MCA tells us a physical address - "where it happens",
   we can do some action to survive.

   If the address is present in resource of "check-in" device,
   it is guaranteed that its driver will call iochk_read in
   the very near future, and that now the driver have a
   ability and responsibility of recovery from the error.

   So if it was "check-in" address, what OS should do is mark
   "check-in" devices and just restart usual works. Soon
   the driver will notice the error and operate it properly.

   Note:
     We can identify a affected device, but because of SAL
     behavior (mentioned at 6 of 10), we need to mark all
     "check-in" devices. Fix in future, if possible.

Changes from previous one for 2.6.11.11:
   - (non)

Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>

---

  arch/ia64/kernel/mca_drv.c  |   84 ++++++++++++++++++++++++++++++++++++++++++++
  arch/ia64/lib/iomap_check.c |    1
  2 files changed, 85 insertions(+)

Index: linux-2.6.13-rc1/arch/ia64/lib/iomap_check.c
===================================================================
--- linux-2.6.13-rc1.orig/arch/ia64/lib/iomap_check.c
+++ linux-2.6.13-rc1/arch/ia64/lib/iomap_check.c
@@ -147,3 +147,4 @@ void clear_bridge_error(struct pci_dev *

  EXPORT_SYMBOL(iochk_read);
  EXPORT_SYMBOL(iochk_clear);
+EXPORT_SYMBOL(iochk_devices);	/* for MCA driver */
Index: linux-2.6.13-rc1/arch/ia64/kernel/mca_drv.c
===================================================================
--- linux-2.6.13-rc1.orig/arch/ia64/kernel/mca_drv.c
+++ linux-2.6.13-rc1/arch/ia64/kernel/mca_drv.c
@@ -35,6 +35,12 @@

  #include "mca_drv.h"

+#ifdef CONFIG_IOMAP_CHECK
+#include <linux/pci.h>
+#include <asm/io.h>
+extern struct list_head iochk_devices;
+#endif
+
  /* max size of SAL error record (default) */
  static int sal_rec_max = 10000;

@@ -377,6 +383,79 @@ is_mca_global(peidx_table_t *peidx, pal_
  	return MCA_IS_GLOBAL;
  }

+#ifdef CONFIG_IOMAP_CHECK
+
+/**
+ * get_target_identifier - get address of target_identifier
+ * @peidx:	pointer of index of processor error section
+ *
+ * Return value:
+ *	addr if valid / 0 if not valid
+ */
+static u64 get_target_identifier(peidx_table_t *peidx)
+{
+	sal_log_mod_error_info_t *smei;
+
+	smei = peidx_bus_check(peidx, 0);
+	if (smei->valid.target_identifier)
+		return (smei->target_identifier);
+	return 0;
+}
+
+/**
+ * offending_addr_in_check - Check if the addr is in checking resource.
+ * @addr:	address offending this MCA
+ *
+ * Return value:
+ *	1 if in / 0 if out
+ */
+static int offending_addr_in_check(u64 addr)
+{
+	int i;
+	struct pci_dev *tdev;
+	iocookie *cookie;
+
+	if (list_empty(&iochk_devices))
+		return 0;
+
+	list_for_each_entry(cookie, &iochk_devices, list) {
+		tdev = cookie->dev;
+		for (i = 0; i < PCI_ROM_RESOURCE; i++) {
+		  if (tdev->resource[i].start <= addr
+		      && addr <= tdev->resource[i].end)
+			return 1;
+		  if ((tdev->resource[i].flags
+		      & (PCI_BASE_ADDRESS_SPACE|PCI_BASE_ADDRESS_MEM_TYPE_MASK))
+		      == (PCI_BASE_ADDRESS_SPACE_MEMORY|PCI_BASE_ADDRESS_MEM_TYPE_64))
+			i++;
+		}
+	}
+	return 0;
+}
+
+/**
+ * pci_error_recovery - Check if MCA occur on transaction in iochk.
+ * @peidx:	pointer of index of processor error section
+ *
+ * Return value:
+ *	1 if error could be cought in driver / 0 if not
+ */
+static int pci_error_recovery(peidx_table_t *peidx)
+{
+	u64 addr;
+
+	addr = get_target_identifier(peidx);
+	if (!addr)
+		return 0;
+
+	if (offending_addr_in_check(addr))
+		return 1;
+
+	return 0;
+}
+
+#endif /* CONFIG_IOMAP_CHECK */
+
  /**
   * recover_from_read_error - Try to recover the errors which type are "read"s.
   * @slidx:	pointer of index of SAL error record
@@ -399,6 +478,11 @@ recover_from_read_error(slidx_table_t *s
  	if (!pbci->tv)
  		return 0;

+#ifdef CONFIG_IOMAP_CHECK
+	if (pci_error_recovery(peidx))
+		return 1;
+#endif
+
  	/*
  	 * cpu read or memory-mapped io read
  	 *



  parent reply	other threads:[~2005-07-06  6:49 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-07-06  4:53 [PATCH 2.6.13-rc1 01/10] IOCHK interface for I/O error handling/detecting Hidetoshi Seto
2005-07-06  5:00 ` [PATCH 2.6.13-rc1 02/10] " Hidetoshi Seto
2005-07-06  5:04 ` [PATCH 2.6.13-rc1 03/10] " Hidetoshi Seto
2005-07-12 19:51   ` Linas Vepstas
2005-07-13  0:18     ` Benjamin Herrenschmidt
2005-07-13 22:42       ` Linas Vepstas
2005-07-13  1:33     ` Hidetoshi Seto
2005-07-06  5:07 ` [PATCH 2.6.13-rc1 04/10] " Hidetoshi Seto
2005-07-06  5:11 ` [PATCH 2.6.13-rc1 05/10] " Hidetoshi Seto
2005-07-18 19:21   ` Grant Grundler
2005-07-06  5:14 ` [PATCH 2.6.13-rc1 06/10] " Hidetoshi Seto
2005-07-06  5:17 ` [PATCH 2.6.13-rc1 07/10] " Hidetoshi Seto
2005-07-08  4:37   ` david mosberger
2005-07-08  5:44     ` Hidetoshi Seto
2005-07-12 21:14   ` Linas Vepstas
2005-07-13  2:00     ` Hidetoshi Seto
2005-07-06  5:18 ` Hidetoshi Seto [this message]
2005-07-12 22:22   ` [PATCH 2.6.13-rc1 08/10] " Linas Vepstas
2005-07-13  1:36     ` Hidetoshi Seto
2005-07-06  5:20 ` [PATCH 2.6.13-rc1 09/10] " Hidetoshi Seto
2005-07-06  5:21 ` [PATCH 2.6.13-rc1 10/10] " Hidetoshi Seto
2005-07-06  6:26 ` [PATCH 2.6.13-rc1 01/10] " YOSHIFUJI Hideaki / 吉藤英明
2005-07-06 10:15   ` Hidetoshi Seto
2005-07-07 18:41 ` Greg KH
2005-07-07 22:27   ` Benjamin Herrenschmidt
2005-07-08 12:22     ` Hidetoshi Seto

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=42CB69BD.1090607@jp.fujitsu.com \
    --to=seto.hidetoshi@jp.fujitsu.com \
    --cc=benh@kernel.crashing.org \
    --cc=linas@austin.ibm.com \
    --cc=linux-ia64@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@atrey.karlin.mff.cuni.cz \
    --cc=linuxppc64-dev@ozlabs.org \
    --cc=tlnguyen@snoqualmie.dp.intel.com \
    --cc=tony.luck@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox