public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: linas <linas@austin.ibm.com>
To: paulus@samba.org
Cc: linuxppc64-dev@ozlabs.org, linux-kernel@vger.kernel.org
Subject: [PATCH 7/7] ppc64: EEH Halt if bad drivers spin in error condition
Date: Thu, 29 Sep 2005 20:02:28 -0500	[thread overview]
Message-ID: <20050930010228.GG6173@austin.ibm.com> (raw)
In-Reply-To: <20050930004800.GL29826@austin.ibm.com>


07-eeh-spin-counter.patch

One an EEH event is triggers, all further I/O to a device is blocked (until
reset).  Bad device drivers may end up spinning in their interrupt handlers, 
trying to read an interrupt status register that will never change state.
This patch moves that spin counter to a per-device structure, and adds
some diagnostic prints to help locate the bad driver.

Signed-off-by: Linas Vepstas <linas@linas.org>

Index: linux-2.6.14-rc2-git6/arch/ppc64/kernel/eeh.c
===================================================================
--- linux-2.6.14-rc2-git6.orig/arch/ppc64/kernel/eeh.c	2005-09-29 16:29:00.726884805 -0500
+++ linux-2.6.14-rc2-git6/arch/ppc64/kernel/eeh.c	2005-09-29 16:32:05.550949258 -0500
@@ -78,14 +78,12 @@
 
 static struct notifier_block *eeh_notifier_chain;
 
-/*
- * If a device driver keeps reading an MMIO register in an interrupt
+/* If a device driver keeps reading an MMIO register in an interrupt
  * handler after a slot isolation event has occurred, we assume it
  * is broken and panic.  This sets the threshold for how many read
  * attempts we allow before panicking.
  */
-#define EEH_MAX_FAILS	1000
-static atomic_t eeh_fail_count;
+#define EEH_MAX_FAILS	100000
 
 /* RTAS tokens */
 static int ibm_set_eeh_option;
@@ -521,7 +519,6 @@
 		       "%s\n", event->reset_state,
 		       pci_name(event->dev));
 
-		atomic_set(&eeh_fail_count, 0);
 		notifier_call_chain (&eeh_notifier_chain,
 				     EEH_NOTIFY_FREEZE, event);
 
@@ -657,12 +654,18 @@
 	spin_lock_irqsave(&confirm_error_lock, flags);
 	rc = 1;
 	if (pdn->eeh_mode & EEH_MODE_ISOLATED) {
-		atomic_inc(&eeh_fail_count);
-		if (atomic_read(&eeh_fail_count) >= EEH_MAX_FAILS) {
+		pdn->eeh_check_count ++;
+		if (pdn->eeh_check_count >= EEH_MAX_FAILS) {
+			printk (KERN_ERR "EEH: Device driver ignored %d bad reads, panicing\n",
+			        pdn->eeh_check_count);
+			dump_stack();
+			
 			/* re-read the slot reset state */
 			if (read_slot_reset_state(pdn, rets) != 0)
 				rets[0] = -1;	/* reset state unknown */
-			eeh_panic(dev, rets[0]);
+
+			/* If we are here, then we hit an infinite loop. Stop. */
+			panic("EEH: MMIO halt (%d) on device:%s\n", rets[0], pci_name(dev));
 		}
 		goto dn_unlock;
 	}
@@ -808,6 +811,8 @@
 	struct pci_dn *pdn = PCI_DN(dn);
 
 	pdn->eeh_mode = 0;
+	pdn->eeh_check_count = 0;
+	pdn->eeh_freeze_count = 0;
 
 	if (status && strcmp(status, "ok") != 0)
 		return NULL;	/* ignore devices with bad status */

  parent reply	other threads:[~2005-09-30  1:02 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-09-30  0:48 [PATCH 0/7] ppc64: Assorted minor EEH cleanups linas
2005-09-30  0:51 ` [PATCH 1/7] ppc64: EEH typos, include files, macros, whitespace linas
2005-10-05 11:11   ` Paul Mackerras
2005-10-07 19:46     ` linas
2005-09-30  0:53 ` [PATCH 2/7] ppc64: EEH PCI address cache cleanups linas
2005-09-30  0:54 ` [PATCH 3/7] ppc64: EEH Add event/internal state statistics linas
2005-10-05 11:14   ` Paul Mackerras
2005-10-07 14:59     ` linas
2005-09-30  0:56 ` [PATCH 4/7] ppc64: EEH PCI slot error details abstraction linas
2005-09-30  0:58 ` [PATCH 5/7] ppc64: EEH handle empty PCI slot failure linas
2005-09-30  1:00 ` [PATCH 6/7] ppc64: EEH Avoid racing reports of errors linas
2005-10-05 11:23   ` Paul Mackerras
2005-10-07 15:23     ` linas
2005-09-30  1:02 ` linas [this message]
2005-09-30  4:49   ` [PATCH 7/7] ppc64: EEH Halt if bad drivers spin in error condition Doug Maxey
2005-09-30 14:58     ` linas
2005-09-30 22:29 ` [PATCH 0/7] ppc64: Assorted minor EEH cleanups linas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20050930010228.GG6173@austin.ibm.com \
    --to=linas@austin.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linuxppc64-dev@ozlabs.org \
    --cc=paulus@samba.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox