linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
From: Gavin Shan <shangw@linux.vnet.ibm.com>
To: linuxppc-dev@lists.ozlabs.org
Cc: Gavin Shan <shangw@linux.vnet.ibm.com>
Subject: [PATCH 11/31] powerpc/eeh: Trace time on first error for PE
Date: Thu, 20 Jun 2013 13:21:01 +0800	[thread overview]
Message-ID: <1371705681-24632-12-git-send-email-shangw@linux.vnet.ibm.com> (raw)
In-Reply-To: <1371705681-24632-1-git-send-email-shangw@linux.vnet.ibm.com>

We're not expecting that one specific PE got frozen for over 5
times in last hour. Otherwise, the PE will be removed from the
system upon newly coming EEH errors. The patch introduces time
stamp to trace the first error on specific PE in last hour and
function to update that accordingly. Besides, the time stamp
is recovered during PE hotplug path as we did for frozen count.

Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/eeh.h   |    3 +++
 arch/powerpc/kernel/eeh_driver.c |    5 +++++
 arch/powerpc/kernel/eeh_pe.c     |   27 +++++++++++++++++++++++++++
 3 files changed, 35 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
index beec788..e1109fd 100644
--- a/arch/powerpc/include/asm/eeh.h
+++ b/arch/powerpc/include/asm/eeh.h
@@ -24,6 +24,7 @@
 #include <linux/init.h>
 #include <linux/list.h>
 #include <linux/string.h>
+#include <linux/time.h>
 
 struct pci_dev;
 struct pci_bus;
@@ -62,6 +63,7 @@ struct eeh_pe {
 	struct pci_bus *bus;		/* Top PCI bus for bus PE	*/
 	int check_count;		/* Times of ignored error	*/
 	int freeze_count;		/* Times of froze up		*/
+	struct timeval tstamp;		/* Time on first-time freeze	*/
 	int false_positives;		/* Times of reported #ff's	*/
 	struct eeh_pe *parent;		/* Parent PE			*/
 	struct list_head child_list;	/* Link PE to the child list	*/
@@ -190,6 +192,7 @@ struct eeh_pe *eeh_phb_pe_get(struct pci_controller *phb);
 struct eeh_pe *eeh_pe_get(struct eeh_dev *edev);
 int eeh_add_to_parent_pe(struct eeh_dev *edev);
 int eeh_rmv_from_parent_pe(struct eeh_dev *edev, int purge_pe);
+void eeh_pe_update_time_stamp(struct eeh_pe *pe);
 void *eeh_pe_dev_traverse(struct eeh_pe *root,
 		eeh_traverse_func fn, void *flag);
 void eeh_pe_restore_bars(struct eeh_pe *pe);
diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c
index fb927af..678bc6c 100644
--- a/arch/powerpc/kernel/eeh_driver.c
+++ b/arch/powerpc/kernel/eeh_driver.c
@@ -349,10 +349,12 @@ static void *eeh_report_failure(void *data, void *userdata)
  */
 static int eeh_reset_device(struct eeh_pe *pe, struct pci_bus *bus)
 {
+	struct timeval tstamp;
 	int cnt, rc;
 
 	/* pcibios will clear the counter; save the value */
 	cnt = pe->freeze_count;
+	tstamp = pe->tstamp;
 
 	/*
 	 * We don't remove the corresponding PE instances because
@@ -385,6 +387,8 @@ static int eeh_reset_device(struct eeh_pe *pe, struct pci_bus *bus)
 		ssleep(5);
 		pcibios_add_pci_devices(bus);
 	}
+
+	pe->tstamp = tstamp;
 	pe->freeze_count = cnt;
 
 	return 0;
@@ -425,6 +429,7 @@ void eeh_handle_event(struct eeh_pe *pe)
 		return;
 	}
 
+	eeh_pe_update_time_stamp(pe);
 	pe->freeze_count++;
 	if (pe->freeze_count > EEH_MAX_ALLOWED_FREEZES)
 		goto excess_failures;
diff --git a/arch/powerpc/kernel/eeh_pe.c b/arch/powerpc/kernel/eeh_pe.c
index c963667..ae75722 100644
--- a/arch/powerpc/kernel/eeh_pe.c
+++ b/arch/powerpc/kernel/eeh_pe.c
@@ -482,6 +482,33 @@ int eeh_rmv_from_parent_pe(struct eeh_dev *edev, int purge_pe)
 }
 
 /**
+ * eeh_pe_update_time_stamp - Update PE's frozen time stamp
+ * @pe: EEH PE
+ *
+ * We have time stamp for each PE to trace its time of getting
+ * frozen in last hour. The function should be called to update
+ * the time stamp on first error of the specific PE. On the other
+ * handle, we needn't account for errors happened in last hour.
+ */
+void eeh_pe_update_time_stamp(struct eeh_pe *pe)
+{
+	struct timeval tstamp;
+
+	if (!pe) return;
+
+	if (pe->freeze_count <= 0) {
+		pe->freeze_count = 0;
+		do_gettimeofday(&pe->tstamp);
+	} else {
+		do_gettimeofday(&tstamp);
+		if (tstamp.tv_sec - pe->tstamp.tv_sec > 3600) {
+			pe->tstamp = tstamp;
+			pe->freeze_count = 0;
+		}
+	}
+}
+
+/**
  * __eeh_pe_state_mark - Mark the state for the PE
  * @data: EEH PE
  * @flag: state
-- 
1.7.5.4

  parent reply	other threads:[~2013-06-20  5:21 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-06-20  5:20 [PATCH v6 00/31] EEH Support for PowerNV platform Gavin Shan
2013-06-20  5:20 ` [PATCH 01/31] powerpc/eeh: Cleanup for EEH core Gavin Shan
2013-06-20  5:20 ` [PATCH 02/31] powerpc/eeh: Move common part to kernel directory Gavin Shan
2013-06-20  5:20 ` [PATCH 03/31] powerpc/eeh: Make eeh_phb_pe_get() public Gavin Shan
2013-06-20  5:20 ` [PATCH 04/31] powerpc/eeh: Make eeh_pe_get() public Gavin Shan
2013-06-20  5:20 ` [PATCH 05/31] powerpc/eeh: Trace PCI bus from PE Gavin Shan
2013-06-20  5:20 ` [PATCH 06/31] powerpc/eeh: Make eeh_init() public Gavin Shan
2013-06-20  5:20 ` [PATCH 07/31] powerpc/eeh: EEH post initialization operation Gavin Shan
2013-06-20  5:20 ` [PATCH 08/31] powerpc/eeh: Refactor eeh_reset_pe_once() Gavin Shan
2013-06-20  5:20 ` [PATCH 09/31] powerpc/eeh: Delay EEH probe during hotplug Gavin Shan
2013-06-20  5:21 ` [PATCH 10/31] powerpc/eeh: Single kthread to handle events Gavin Shan
2013-06-20  5:21 ` Gavin Shan [this message]
2013-06-20  5:21 ` [PATCH 12/31] powerpc/eeh: Allow to purge EEH events Gavin Shan
2013-06-20  5:21 ` [PATCH 13/31] powerpc/eeh: Export confirm_error_lock Gavin Shan
2013-06-20  5:21 ` [PATCH 14/31] powerpc/eeh: EEH core to handle special event Gavin Shan
2013-06-20  5:21 ` [PATCH 15/31] powerpc/eeh: Sync OPAL API with firmware Gavin Shan
2013-06-20  5:21 ` [PATCH 16/31] powerpc/eeh: EEH backend for P7IOC Gavin Shan
2013-06-20  5:21 ` [PATCH 17/31] powerpc/eeh: I/O chip post initialization Gavin Shan
2013-06-20  5:21 ` [PATCH 18/31] powerpc/eeh: I/O chip EEH enable option Gavin Shan
2013-06-20  5:21 ` [PATCH 19/31] powerpc/eeh: I/O chip EEH state retrieval Gavin Shan
2013-06-20  5:21 ` [PATCH 20/31] powerpc/eeh: I/O chip PE reset Gavin Shan
2013-06-20  5:21 ` [PATCH 21/31] powerpc/eeh: I/O chip PE log and bridge setup Gavin Shan
2013-06-20  5:21 ` [PATCH 22/31] powerpc/eeh: I/O chip next error Gavin Shan
2013-06-20  5:21 ` [PATCH 23/31] powerpc/eeh: PowerNV EEH backends Gavin Shan
2013-06-20  5:21 ` [PATCH 24/31] powerpc/eeh: Initialization for PowerNV Gavin Shan
2013-06-20  5:21 ` [PATCH 25/31] powerpc/eeh: Enable EEH check for config access Gavin Shan
2013-06-20  5:21 ` [PATCH 26/31] powerpc/eeh: Allow to check fenced PHB proactively Gavin Shan
2013-06-20  5:21 ` [PATCH 27/31] powernv/opal: Notifier for OPAL events Gavin Shan
2013-06-20  5:21 ` [PATCH 28/31] powernv/opal: Disable OPAL notifier upon poweroff Gavin Shan
2013-06-20  5:21 ` [PATCH 29/31] powerpc/eeh: Register OPAL notifier for PCI error Gavin Shan
2013-06-20  5:21 ` [PATCH 30/31] powerpc/powernv: Debugfs directory for PHB Gavin Shan
2013-06-20  5:21 ` [PATCH 31/31] powerpc/eeh: Debugfs for error injection Gavin Shan
  -- strict thread matches above, loose matches on Subject: below --
2013-06-18  8:33 [PATCH v5 00/31] EEH Support for PowerNV platform Gavin Shan
2013-06-18  8:33 ` [PATCH 11/31] powerpc/eeh: Trace time on first error for PE Gavin Shan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1371705681-24632-12-git-send-email-shangw@linux.vnet.ibm.com \
    --to=shangw@linux.vnet.ibm.com \
    --cc=linuxppc-dev@lists.ozlabs.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).