linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [V3 PATCH 0/9] x86, NMI: give NMI handler a face-lift
@ 2010-11-30 22:27 Don Zickus
  2010-11-30 22:27 ` [PATCH 1/9] x86, NMI: Add NMI symbol constants and rename memory parity to PCI SERR Don Zickus
                   ` (9 more replies)
  0 siblings, 10 replies; 17+ messages in thread
From: Don Zickus @ 2010-11-30 22:27 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, Robert Richter, ying.huang, Andi Kleen, gorcunov,
	LKML, Don Zickus

Restructuring the nmi handler to be more readable and simpler.

This is just laying the ground work for future improvements in this area.

I also left out one of Huang's patch until we figure out how we are going
to proceed with a new notifier.

Tested 32-bit and 64-bit on AMD and Intel machines.

V3:  rebased on top of removal of nmi watchdog
     add a couple more watchdog patches from my queue

V2:  add a patch to kill DIE_NMI_IPI and add in priorities

Cheers,
Don

Don Zickus (5):
  x86, NMI: Remove DIE_NMI_IPI and add priorties to handlers
  x86: only call smp_processor_id in non-preempt cases
  panic:  ratelimit panic messages
  watchdog:  touch_nmi_watchdog should only touch local cpu not every
    one

Dongdong Deng (1):
  x86: Avoid calling arch_trigger_all_cpu_backtrace() at the same time

Huang Ying (3):
  x86, NMI: Add NMI symbol constants and rename memory parity to PCI
    SERR
  x86, NMI: Add touch_nmi_watchdog to io_check_error delay
  x86, NMI: Rewrite NMI handler
  x86, NMI: Allow NMI reason io port (0x61) to be processed on any CPU

 Documentation/kernel-parameters.txt     |    6 ++
 arch/x86/include/asm/kdebug.h           |    1 -
 arch/x86/include/asm/mach_traps.h       |   12 +++-
 arch/x86/include/asm/nmi.h              |   20 ++++++
 arch/x86/kernel/apic/hw_nmi.c           |   19 ++++-
 arch/x86/kernel/apic/x2apic_uv_x.c      |    2 +-
 arch/x86/kernel/cpu/mcheck/mce-inject.c |    5 +-
 arch/x86/kernel/cpu/perf_event.c        |    3 +-
 arch/x86/kernel/kgdb.c                  |    6 +--
 arch/x86/kernel/reboot.c                |    5 +-
 arch/x86/kernel/traps.c                 |  114 ++++++++++++++++---------------
 arch/x86/oprofile/nmi_int.c             |    3 +-
 arch/x86/oprofile/nmi_timer_int.c       |    2 +-
 drivers/char/ipmi/ipmi_watchdog.c       |    2 +-
 drivers/watchdog/hpwdt.c                |    2 +-
 kernel/panic.c                          |   30 ++++++++
 kernel/watchdog.c                       |   17 +++--
 17 files changed, 166 insertions(+), 83 deletions(-)

-- 
1.7.3.2


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 1/9] x86, NMI: Add NMI symbol constants and rename memory parity to PCI SERR
  2010-11-30 22:27 [V3 PATCH 0/9] x86, NMI: give NMI handler a face-lift Don Zickus
@ 2010-11-30 22:27 ` Don Zickus
  2010-11-30 22:27 ` [PATCH 2/9] x86, NMI: Add touch_nmi_watchdog to io_check_error delay Don Zickus
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 17+ messages in thread
From: Don Zickus @ 2010-11-30 22:27 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, Robert Richter, ying.huang, Andi Kleen, gorcunov,
	LKML, Don Zickus

From: Huang Ying <ying.huang@intel.com>

Replace the NMI related magic numbers with symbol constants.

memory parity error is only valid for IBM PC-AT, newer machine use
bit 7 (0x80) of 0x61 port for PCI SERR. While memory error is usually
reported via MCE. So corresponding function name and kernel log string
is changed.

But on some machines, PCI SERR line is still used to report memory
errors. This is used by EDAC, so corresponding EDAC call is reserved.

v3:

- Merge adding symbol constants and renaming patches

v2:

- EDAC call in pci_serr_error is reserved.

Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Don Zickus <dzickus@redhat.com>
---
 arch/x86/include/asm/mach_traps.h |   12 ++++++++-
 arch/x86/kernel/traps.c           |   51 +++++++++++++++++++------------------
 2 files changed, 37 insertions(+), 26 deletions(-)

diff --git a/arch/x86/include/asm/mach_traps.h b/arch/x86/include/asm/mach_traps.h
index f792060..72a8b52 100644
--- a/arch/x86/include/asm/mach_traps.h
+++ b/arch/x86/include/asm/mach_traps.h
@@ -7,9 +7,19 @@
 
 #include <asm/mc146818rtc.h>
 
+#define NMI_REASON_PORT		0x61
+
+#define NMI_REASON_SERR		0x80
+#define NMI_REASON_IOCHK	0x40
+#define NMI_REASON_MASK		(NMI_REASON_SERR | NMI_REASON_IOCHK)
+
+#define NMI_REASON_CLEAR_SERR	0x04
+#define NMI_REASON_CLEAR_IOCHK	0x08
+#define NMI_REASON_CLEAR_MASK	0x0f
+
 static inline unsigned char get_nmi_reason(void)
 {
-	return inb(0x61);
+	return inb(NMI_REASON_PORT);
 }
 
 static inline void reassert_nmi(void)
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index bb6f041..10a43f5 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -310,15 +310,15 @@ static int __init setup_unknown_nmi_panic(char *str)
 __setup("unknown_nmi_panic", setup_unknown_nmi_panic);
 
 static notrace __kprobes void
-mem_parity_error(unsigned char reason, struct pt_regs *regs)
+pci_serr_error(unsigned char reason, struct pt_regs *regs)
 {
-	printk(KERN_EMERG
-		"Uhhuh. NMI received for unknown reason %02x on CPU %d.\n",
-			reason, smp_processor_id());
-
-	printk(KERN_EMERG
-		"You have some hardware problem, likely on the PCI bus.\n");
+	pr_emerg("NMI: PCI system error (SERR) for reason %02x on CPU %d.\n",
+		 reason, smp_processor_id());
 
+	/*
+	 * On some machines, PCI SERR line is used to report memory
+	 * errors. EDAC makes use of it.
+	 */
 #if defined(CONFIG_EDAC)
 	if (edac_handler_set()) {
 		edac_atomic_assert_error();
@@ -329,11 +329,11 @@ mem_parity_error(unsigned char reason, struct pt_regs *regs)
 	if (panic_on_unrecovered_nmi)
 		panic("NMI: Not continuing");
 
-	printk(KERN_EMERG "Dazed and confused, but trying to continue\n");
+	pr_emerg("Dazed and confused, but trying to continue\n");
 
-	/* Clear and disable the memory parity error line. */
-	reason = (reason & 0xf) | 4;
-	outb(reason, 0x61);
+	/* Clear and disable the PCI SERR error line. */
+	reason = (reason & NMI_REASON_CLEAR_MASK) | NMI_REASON_CLEAR_SERR;
+	outb(reason, NMI_REASON_PORT);
 }
 
 static notrace __kprobes void
@@ -341,22 +341,24 @@ io_check_error(unsigned char reason, struct pt_regs *regs)
 {
 	unsigned long i;
 
-	printk(KERN_EMERG "NMI: IOCK error (debug interrupt?)\n");
+	pr_emerg(
+	"NMI: IOCK error (debug interrupt?) for reason %02x on CPU %d.\n",
+		 reason, smp_processor_id());
 	show_registers(regs);
 
 	if (panic_on_io_nmi)
 		panic("NMI IOCK error: Not continuing");
 
 	/* Re-enable the IOCK line, wait for a few seconds */
-	reason = (reason & 0xf) | 8;
-	outb(reason, 0x61);
+	reason = (reason & NMI_REASON_CLEAR_MASK) | NMI_REASON_CLEAR_IOCHK;
+	outb(reason, NMI_REASON_PORT);
 
 	i = 2000;
 	while (--i)
 		udelay(1000);
 
-	reason &= ~8;
-	outb(reason, 0x61);
+	reason &= ~NMI_REASON_CLEAR_IOCHK;
+	outb(reason, NMI_REASON_PORT);
 }
 
 static notrace __kprobes void
@@ -375,15 +377,14 @@ unknown_nmi_error(unsigned char reason, struct pt_regs *regs)
 		return;
 	}
 #endif
-	printk(KERN_EMERG
-		"Uhhuh. NMI received for unknown reason %02x on CPU %d.\n",
-			reason, smp_processor_id());
+	pr_emerg("Uhhuh. NMI received for unknown reason %02x on CPU %d.\n",
+		 reason, smp_processor_id());
 
-	printk(KERN_EMERG "Do you have a strange power saving mode enabled?\n");
+	pr_emerg("Do you have a strange power saving mode enabled?\n");
 	if (unknown_nmi_panic || panic_on_unrecovered_nmi)
 		panic("NMI: Not continuing");
 
-	printk(KERN_EMERG "Dazed and confused, but trying to continue\n");
+	pr_emerg("Dazed and confused, but trying to continue\n");
 }
 
 static notrace __kprobes void default_do_nmi(struct pt_regs *regs)
@@ -397,7 +398,7 @@ static notrace __kprobes void default_do_nmi(struct pt_regs *regs)
 	if (!cpu)
 		reason = get_nmi_reason();
 
-	if (!(reason & 0xc0)) {
+	if (!(reason & NMI_REASON_MASK)) {
 		if (notify_die(DIE_NMI_IPI, "nmi_ipi", regs, reason, 2, SIGINT)
 								== NOTIFY_STOP)
 			return;
@@ -415,9 +416,9 @@ static notrace __kprobes void default_do_nmi(struct pt_regs *regs)
 		return;
 
 	/* AK: following checks seem to be broken on modern chipsets. FIXME */
-	if (reason & 0x80)
-		mem_parity_error(reason, regs);
-	if (reason & 0x40)
+	if (reason & NMI_REASON_SERR)
+		pci_serr_error(reason, regs);
+	if (reason & NMI_REASON_IOCHK)
 		io_check_error(reason, regs);
 #ifdef CONFIG_X86_32
 	/*
-- 
1.7.3.2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 2/9] x86, NMI: Add touch_nmi_watchdog to io_check_error delay
  2010-11-30 22:27 [V3 PATCH 0/9] x86, NMI: give NMI handler a face-lift Don Zickus
  2010-11-30 22:27 ` [PATCH 1/9] x86, NMI: Add NMI symbol constants and rename memory parity to PCI SERR Don Zickus
@ 2010-11-30 22:27 ` Don Zickus
  2010-11-30 22:27 ` [PATCH 3/9] x86, NMI: Rewrite NMI handler Don Zickus
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 17+ messages in thread
From: Don Zickus @ 2010-11-30 22:27 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, Robert Richter, ying.huang, Andi Kleen, gorcunov,
	LKML, Don Zickus

From: Huang Ying <ying.huang@intel.com>

Prevent the long delay in io_check_error making NMI watchdog timeout.

Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Don Zickus <dzickus@redhat.com>
---
 arch/x86/kernel/traps.c |    8 +++++---
 1 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 10a43f5..c7fd1ce 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -353,9 +353,11 @@ io_check_error(unsigned char reason, struct pt_regs *regs)
 	reason = (reason & NMI_REASON_CLEAR_MASK) | NMI_REASON_CLEAR_IOCHK;
 	outb(reason, NMI_REASON_PORT);
 
-	i = 2000;
-	while (--i)
-		udelay(1000);
+	i = 20000;
+	while (--i) {
+		touch_nmi_watchdog();
+		udelay(100);
+	}
 
 	reason &= ~NMI_REASON_CLEAR_IOCHK;
 	outb(reason, NMI_REASON_PORT);
-- 
1.7.3.2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 3/9] x86, NMI: Rewrite NMI handler
  2010-11-30 22:27 [V3 PATCH 0/9] x86, NMI: give NMI handler a face-lift Don Zickus
  2010-11-30 22:27 ` [PATCH 1/9] x86, NMI: Add NMI symbol constants and rename memory parity to PCI SERR Don Zickus
  2010-11-30 22:27 ` [PATCH 2/9] x86, NMI: Add touch_nmi_watchdog to io_check_error delay Don Zickus
@ 2010-11-30 22:27 ` Don Zickus
  2010-11-30 22:27 ` [PATCH 4/9] x86, NMI: Remove DIE_NMI_IPI and add priorties to handlers Don Zickus
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 17+ messages in thread
From: Don Zickus @ 2010-11-30 22:27 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, Robert Richter, ying.huang, Andi Kleen, gorcunov,
	LKML, Don Zickus

From: Huang Ying <ying.huang@intel.com>

The original NMI handler is quite outdated in many aspects. This patch
try to fix it.

The order to process the NMI sources are changed as follow:

notify_die(DIE_NMI_IPI);
notify_die(DIE_NMI);
/* process io port 0x61 */
nmi_watchdog_touch();
unknown_nmi();

DIE_NMI_IPI is used to process CPU specific NMI sources, such as perf
event, oprofile, crash IPI, etc. While DIE_NMI is used to process
non-CPU-specific NMI sources, such as APEI (ACPI Platform Error
Interface) GHES (Generic Hardware Error Source), etc. Non-CPU-specific
NMI sources can be processed on any CPU,

DIE_NMI_IPI must be processed before DIE_NMI. For example, perf event
trigger a NMI on CPU 1, at the same time, APEI GHES trigger another
NMI on CPU 0. If DIE_NMI is processed before DIE_NMI_IPI, it is
possible that APEI GHES is processed on CPU 1, while unknown NMI is
gotten on CPU 0.

In this new order of processing, performance sensitive NMI sources
such as oprofile or perf event will have better performance because
the time consuming IO port reading is done after them.

Only one NMI is eaten for each NMI handler call, even for PCI SERR and
IOCHK NMIs. Because one NMI should be raised for each of them, eating
too many NMI will cause unnecessary unknown NMI.

The die value used in NMI sources are fixed accordingly.

The NMI handler in the patch is designed by Andi Kleen.

v3:

- Make DIE_NMI and DIE_NMI_UNKNOWN work in more traditional way.

v2:

- Split process NMI reason (0x61) on non-BSP into another patch

Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Don Zickus <dzickus@redhat.com>
---
 arch/x86/kernel/cpu/perf_event.c  |    1 -
 arch/x86/kernel/traps.c           |   63 ++++++++++++++++++++-----------------
 arch/x86/oprofile/nmi_int.c       |    1 -
 arch/x86/oprofile/nmi_timer_int.c |    2 +-
 drivers/char/ipmi/ipmi_watchdog.c |    2 +-
 drivers/watchdog/hpwdt.c          |    2 +-
 6 files changed, 37 insertions(+), 34 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
index 817d2b1..48aa91f 100644
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -1229,7 +1229,6 @@ perf_event_nmi_handler(struct notifier_block *self,
 		return NOTIFY_DONE;
 
 	switch (cmd) {
-	case DIE_NMI:
 	case DIE_NMI_IPI:
 		break;
 	case DIE_NMIUNKNOWN:
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index c7fd1ce..5184b21 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -394,41 +394,46 @@ static notrace __kprobes void default_do_nmi(struct pt_regs *regs)
 	unsigned char reason = 0;
 	int cpu;
 
-	cpu = smp_processor_id();
+	/*
+	 * CPU-specific NMI must be processed before non-CPU-specific
+	 * NMI, otherwise we may lose it, because the CPU-specific
+	 * NMI can not be detected/processed on other CPUs.
+	 */
+
+	/*
+	 * CPU-specific NMI: send to specific CPU or NMI sources must
+	 * be processed on specific CPU
+	 */
+	if (notify_die(DIE_NMI_IPI, "nmi_ipi", regs, 0, 2, SIGINT)
+	    == NOTIFY_STOP)
+		return;
 
+	/* Non-CPU-specific NMI: NMI sources can be processed on any CPU */
+	cpu = smp_processor_id();
 	/* Only the BSP gets external NMIs from the system. */
-	if (!cpu)
+	if (!cpu) {
 		reason = get_nmi_reason();
-
-	if (!(reason & NMI_REASON_MASK)) {
-		if (notify_die(DIE_NMI_IPI, "nmi_ipi", regs, reason, 2, SIGINT)
-								== NOTIFY_STOP)
-			return;
-
-#ifdef CONFIG_X86_LOCAL_APIC
-		if (notify_die(DIE_NMI, "nmi", regs, reason, 2, SIGINT)
-							== NOTIFY_STOP)
-			return;
+		if (reason & NMI_REASON_MASK) {
+			if (notify_die(DIE_NMI, "nmi", regs, reason, 2, SIGINT)
+			    == NOTIFY_STOP)
+				return;
+			if (reason & NMI_REASON_SERR)
+				pci_serr_error(reason, regs);
+			else if (reason & NMI_REASON_IOCHK)
+				io_check_error(reason, regs);
+#ifdef CONFIG_X86_32
+			/*
+			 * Reassert NMI in case it became active
+			 * meanwhile as it's edge-triggered:
+			 */
+			reassert_nmi();
 #endif
-		unknown_nmi_error(reason, regs);
-
-		return;
+			return;
+		}
 	}
-	if (notify_die(DIE_NMI, "nmi", regs, reason, 2, SIGINT) == NOTIFY_STOP)
+	if (notify_die(DIE_NMI, "nmi", regs, 0, 2, SIGINT) == NOTIFY_STOP)
 		return;
-
-	/* AK: following checks seem to be broken on modern chipsets. FIXME */
-	if (reason & NMI_REASON_SERR)
-		pci_serr_error(reason, regs);
-	if (reason & NMI_REASON_IOCHK)
-		io_check_error(reason, regs);
-#ifdef CONFIG_X86_32
-	/*
-	 * Reassert NMI in case it became active meanwhile
-	 * as it's edge-triggered:
-	 */
-	reassert_nmi();
-#endif
+	unknown_nmi_error(reason, regs);
 }
 
 dotraplinkage notrace __kprobes void
diff --git a/arch/x86/oprofile/nmi_int.c b/arch/x86/oprofile/nmi_int.c
index 4e8baad..ee7ff0e 100644
--- a/arch/x86/oprofile/nmi_int.c
+++ b/arch/x86/oprofile/nmi_int.c
@@ -64,7 +64,6 @@ static int profile_exceptions_notify(struct notifier_block *self,
 	int ret = NOTIFY_DONE;
 
 	switch (val) {
-	case DIE_NMI:
 	case DIE_NMI_IPI:
 		if (ctr_running)
 			model->check_ctrs(args->regs, &__get_cpu_var(cpu_msrs));
diff --git a/arch/x86/oprofile/nmi_timer_int.c b/arch/x86/oprofile/nmi_timer_int.c
index 0636dd9..fb40645 100644
--- a/arch/x86/oprofile/nmi_timer_int.c
+++ b/arch/x86/oprofile/nmi_timer_int.c
@@ -25,7 +25,7 @@ static int profile_timer_exceptions_notify(struct notifier_block *self,
 	int ret = NOTIFY_DONE;
 
 	switch (val) {
-	case DIE_NMI:
+	case DIE_NMI_IPI:
 		oprofile_add_sample(args->regs, 0);
 		ret = NOTIFY_STOP;
 		break;
diff --git a/drivers/char/ipmi/ipmi_watchdog.c b/drivers/char/ipmi/ipmi_watchdog.c
index f4d334f..320668f 100644
--- a/drivers/char/ipmi/ipmi_watchdog.c
+++ b/drivers/char/ipmi/ipmi_watchdog.c
@@ -1081,7 +1081,7 @@ ipmi_nmi(struct notifier_block *self, unsigned long val, void *data)
 {
 	struct die_args *args = data;
 
-	if (val != DIE_NMI)
+	if (val != DIE_NMIUNKNOWN)
 		return NOTIFY_OK;
 
 	/* Hack, if it's a memory or I/O error, ignore it. */
diff --git a/drivers/watchdog/hpwdt.c b/drivers/watchdog/hpwdt.c
index c19f4a2..717c67b 100644
--- a/drivers/watchdog/hpwdt.c
+++ b/drivers/watchdog/hpwdt.c
@@ -469,7 +469,7 @@ static int hpwdt_pretimeout(struct notifier_block *nb, unsigned long ulReason,
 	unsigned long rom_pl;
 	static int die_nmi_called;
 
-	if (ulReason != DIE_NMI && ulReason != DIE_NMI_IPI)
+	if (ulReason != DIE_NMIUNKNOWN)
 		goto out;
 
 	if (!hpwdt_nmi_decoding)
-- 
1.7.3.2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 4/9] x86, NMI: Remove DIE_NMI_IPI and add priorties to handlers
  2010-11-30 22:27 [V3 PATCH 0/9] x86, NMI: give NMI handler a face-lift Don Zickus
                   ` (2 preceding siblings ...)
  2010-11-30 22:27 ` [PATCH 3/9] x86, NMI: Rewrite NMI handler Don Zickus
@ 2010-11-30 22:27 ` Don Zickus
  2010-12-01 18:41   ` Cyrill Gorcunov
  2010-11-30 22:27 ` [PATCH 5/9] x86, NMI: Allow NMI reason io port (0x61) to be processed on any CPU Don Zickus
                   ` (5 subsequent siblings)
  9 siblings, 1 reply; 17+ messages in thread
From: Don Zickus @ 2010-11-30 22:27 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, Robert Richter, ying.huang, Andi Kleen, gorcunov,
	LKML, Don Zickus

When re-ordering how the NMI handles its callbacks, a conversation started
asking what DIE_NMI_IPI meant.  No one could answer it.

Noticing that is was wasteful to call the die_chain a second time with just
another argument, DIE_NMI_IPI, it was decided to nuke it and add priorities
to the die_chain handlers to maintain existing behaviour.

This patch replaces DIE_NMI_IPI with the appropriate option, mostly DIE_NMI.
Then it adds priorities to those handlers, using a globally defined set of
priorities for NMI.

The thought is eventually we will just switch the nmi handlers from the
die_chain to something more nmi specific.

Signed-off-by: Don Zickus <dzickus@redhat.com>
---
 arch/x86/include/asm/kdebug.h           |    1 -
 arch/x86/include/asm/nmi.h              |   20 ++++++++++++++++++++
 arch/x86/kernel/apic/hw_nmi.c           |    3 +--
 arch/x86/kernel/apic/x2apic_uv_x.c      |    2 +-
 arch/x86/kernel/cpu/mcheck/mce-inject.c |    5 +++--
 arch/x86/kernel/cpu/perf_event.c        |    4 ++--
 arch/x86/kernel/kgdb.c                  |    6 +-----
 arch/x86/kernel/reboot.c                |    5 ++++-
 arch/x86/kernel/traps.c                 |    8 +-------
 arch/x86/oprofile/nmi_int.c             |    4 ++--
 arch/x86/oprofile/nmi_timer_int.c       |    4 ++--
 11 files changed, 37 insertions(+), 25 deletions(-)

diff --git a/arch/x86/include/asm/kdebug.h b/arch/x86/include/asm/kdebug.h
index f23eb25..ca242d3 100644
--- a/arch/x86/include/asm/kdebug.h
+++ b/arch/x86/include/asm/kdebug.h
@@ -18,7 +18,6 @@ enum die_val {
 	DIE_TRAP,
 	DIE_GPF,
 	DIE_CALL,
-	DIE_NMI_IPI,
 	DIE_PAGE_FAULT,
 	DIE_NMIUNKNOWN,
 };
diff --git a/arch/x86/include/asm/nmi.h b/arch/x86/include/asm/nmi.h
index 3545838..9c4ff0b 100644
--- a/arch/x86/include/asm/nmi.h
+++ b/arch/x86/include/asm/nmi.h
@@ -23,6 +23,26 @@ void arch_trigger_all_cpu_backtrace(void);
 #define arch_trigger_all_cpu_backtrace arch_trigger_all_cpu_backtrace
 #endif
 
+/*
+ * Define some priorities for the nmi notifier call chain.
+ *
+ * Create a local nmi bit that has a higher priority than
+ * external nmis, because the local ones are more frequent.
+ * 
+ * Also setup some default high/normal/low settings for
+ * subsystems to registers with.  Using 4 bits to seperate
+ * the priorities.  This can go alot higher if needed be.
+ */
+
+#define NMI_LOCAL_SHIFT		16	/* randomly picked */
+#define NMI_LOCAL_BIT		(1ULL << NMI_LOCAL_SHIFT)
+#define NMI_HIGH_PRIOR		(1ULL << 8)
+#define NMI_NORMAL_PRIOR	(1ULL << 4)
+#define NMI_LOW_PRIOR		(1ULL << 0)
+#define NMI_LOCAL_HIGH_PRIOR	(NMI_LOCAL_BIT | NMI_HIGH_PRIOR)
+#define NMI_LOCAL_NORMAL_PRIOR	(NMI_LOCAL_BIT | NMI_NORMAL_PRIOR)
+#define NMI_LOCAL_LOW_PRIOR	(NMI_LOCAL_BIT | NMI_LOW_PRIOR)
+
 void stop_nmi(void);
 void restart_nmi(void);
 
diff --git a/arch/x86/kernel/apic/hw_nmi.c b/arch/x86/kernel/apic/hw_nmi.c
index a86840c..82bff25 100644
--- a/arch/x86/kernel/apic/hw_nmi.c
+++ b/arch/x86/kernel/apic/hw_nmi.c
@@ -63,7 +63,6 @@ arch_trigger_all_cpu_backtrace_handler(struct notifier_block *self,
 
 	switch (cmd) {
 	case DIE_NMI:
-	case DIE_NMI_IPI:
 		break;
 
 	default:
@@ -90,7 +89,7 @@ arch_trigger_all_cpu_backtrace_handler(struct notifier_block *self,
 static __read_mostly struct notifier_block backtrace_notifier = {
 	.notifier_call          = arch_trigger_all_cpu_backtrace_handler,
 	.next                   = NULL,
-	.priority               = 1
+	.priority               = NMI_LOCAL_LOW_PRIOR,
 };
 
 static int __init register_trigger_all_cpu_backtrace(void)
diff --git a/arch/x86/kernel/apic/x2apic_uv_x.c b/arch/x86/kernel/apic/x2apic_uv_x.c
index c1c52c3..927902d 100644
--- a/arch/x86/kernel/apic/x2apic_uv_x.c
+++ b/arch/x86/kernel/apic/x2apic_uv_x.c
@@ -639,7 +639,7 @@ void __cpuinit uv_cpu_init(void)
  */
 int uv_handle_nmi(struct notifier_block *self, unsigned long reason, void *data)
 {
-	if (reason != DIE_NMI_IPI)
+	if (reason != DIE_NMIUNKNOWN)
 		return NOTIFY_OK;
 
 	if (in_crash_kexec)
diff --git a/arch/x86/kernel/cpu/mcheck/mce-inject.c b/arch/x86/kernel/cpu/mcheck/mce-inject.c
index e7dbde7..a779719 100644
--- a/arch/x86/kernel/cpu/mcheck/mce-inject.c
+++ b/arch/x86/kernel/cpu/mcheck/mce-inject.c
@@ -25,6 +25,7 @@
 #include <linux/gfp.h>
 #include <asm/mce.h>
 #include <asm/apic.h>
+#include <asm/nmi.h>
 
 /* Update fake mce registers on current CPU. */
 static void inject_mce(struct mce *m)
@@ -83,7 +84,7 @@ static int mce_raise_notify(struct notifier_block *self,
 	struct die_args *args = (struct die_args *)data;
 	int cpu = smp_processor_id();
 	struct mce *m = &__get_cpu_var(injectm);
-	if (val != DIE_NMI_IPI || !cpumask_test_cpu(cpu, mce_inject_cpumask))
+	if (val != DIE_NMI || !cpumask_test_cpu(cpu, mce_inject_cpumask))
 		return NOTIFY_DONE;
 	cpumask_clear_cpu(cpu, mce_inject_cpumask);
 	if (m->inject_flags & MCJ_EXCEPTION)
@@ -95,7 +96,7 @@ static int mce_raise_notify(struct notifier_block *self,
 
 static struct notifier_block mce_raise_nb = {
 	.notifier_call = mce_raise_notify,
-	.priority = 1000,
+	.priority = NMI_LOCAL_NORMAL_PRIOR,
 };
 
 /* Inject mce on current CPU */
diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
index 48aa91f..7b91396 100644
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -1229,7 +1229,7 @@ perf_event_nmi_handler(struct notifier_block *self,
 		return NOTIFY_DONE;
 
 	switch (cmd) {
-	case DIE_NMI_IPI:
+	case DIE_NMI:
 		break;
 	case DIE_NMIUNKNOWN:
 		this_nmi = percpu_read(irq_stat.__nmi_count);
@@ -1279,7 +1279,7 @@ perf_event_nmi_handler(struct notifier_block *self,
 static __read_mostly struct notifier_block perf_event_nmi_notifier = {
 	.notifier_call		= perf_event_nmi_handler,
 	.next			= NULL,
-	.priority		= 1
+	.priority		= NMI_LOCAL_LOW_PRIOR,
 };
 
 static struct event_constraint unconstrained;
diff --git a/arch/x86/kernel/kgdb.c b/arch/x86/kernel/kgdb.c
index cd21b65..1e8b583 100644
--- a/arch/x86/kernel/kgdb.c
+++ b/arch/x86/kernel/kgdb.c
@@ -525,10 +525,6 @@ static int __kgdb_notify(struct die_args *args, unsigned long cmd)
 		}
 		return NOTIFY_DONE;
 
-	case DIE_NMI_IPI:
-		/* Just ignore, we will handle the roundup on DIE_NMI. */
-		return NOTIFY_DONE;
-
 	case DIE_NMIUNKNOWN:
 		if (was_in_debug_nmi[raw_smp_processor_id()]) {
 			was_in_debug_nmi[raw_smp_processor_id()] = 0;
@@ -606,7 +602,7 @@ static struct notifier_block kgdb_notifier = {
 	/*
 	 * Lowest-prio notifier priority, we want to be notified last:
 	 */
-	.priority	= -INT_MAX,
+	.priority	= NMI_LOCAL_LOW_PRIOR,
 };
 
 /**
diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c
index c495aa8..fc7aae1 100644
--- a/arch/x86/kernel/reboot.c
+++ b/arch/x86/kernel/reboot.c
@@ -18,6 +18,7 @@
 #include <asm/pci_x86.h>
 #include <asm/virtext.h>
 #include <asm/cpu.h>
+#include <asm/nmi.h>
 
 #ifdef CONFIG_X86_32
 # include <linux/ctype.h>
@@ -747,7 +748,7 @@ static int crash_nmi_callback(struct notifier_block *self,
 {
 	int cpu;
 
-	if (val != DIE_NMI_IPI)
+	if (val != DIE_NMI)
 		return NOTIFY_OK;
 
 	cpu = raw_smp_processor_id();
@@ -778,6 +779,8 @@ static void smp_send_nmi_allbutself(void)
 
 static struct notifier_block crash_nmi_nb = {
 	.notifier_call = crash_nmi_callback,
+	/* we want to be the first one called */
+	.priority = NMI_LOCAL_HIGH_PRIOR+1,
 };
 
 /* Halt all other CPUs, calling the specified function on each of them
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 5184b21..0668b51 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -404,8 +404,7 @@ static notrace __kprobes void default_do_nmi(struct pt_regs *regs)
 	 * CPU-specific NMI: send to specific CPU or NMI sources must
 	 * be processed on specific CPU
 	 */
-	if (notify_die(DIE_NMI_IPI, "nmi_ipi", regs, 0, 2, SIGINT)
-	    == NOTIFY_STOP)
+	if (notify_die(DIE_NMI, "nmi", regs, 0, 2, SIGINT) == NOTIFY_STOP)
 		return;
 
 	/* Non-CPU-specific NMI: NMI sources can be processed on any CPU */
@@ -414,9 +413,6 @@ static notrace __kprobes void default_do_nmi(struct pt_regs *regs)
 	if (!cpu) {
 		reason = get_nmi_reason();
 		if (reason & NMI_REASON_MASK) {
-			if (notify_die(DIE_NMI, "nmi", regs, reason, 2, SIGINT)
-			    == NOTIFY_STOP)
-				return;
 			if (reason & NMI_REASON_SERR)
 				pci_serr_error(reason, regs);
 			else if (reason & NMI_REASON_IOCHK)
@@ -431,8 +427,6 @@ static notrace __kprobes void default_do_nmi(struct pt_regs *regs)
 			return;
 		}
 	}
-	if (notify_die(DIE_NMI, "nmi", regs, 0, 2, SIGINT) == NOTIFY_STOP)
-		return;
 	unknown_nmi_error(reason, regs);
 }
 
diff --git a/arch/x86/oprofile/nmi_int.c b/arch/x86/oprofile/nmi_int.c
index ee7ff0e..e476044 100644
--- a/arch/x86/oprofile/nmi_int.c
+++ b/arch/x86/oprofile/nmi_int.c
@@ -64,7 +64,7 @@ static int profile_exceptions_notify(struct notifier_block *self,
 	int ret = NOTIFY_DONE;
 
 	switch (val) {
-	case DIE_NMI_IPI:
+	case DIE_NMI:
 		if (ctr_running)
 			model->check_ctrs(args->regs, &__get_cpu_var(cpu_msrs));
 		else if (!nmi_enabled)
@@ -360,7 +360,7 @@ static void nmi_cpu_setup(void *dummy)
 static struct notifier_block profile_exceptions_nb = {
 	.notifier_call = profile_exceptions_notify,
 	.next = NULL,
-	.priority = 2
+	.priority = NMI_LOCAL_LOW_PRIOR,
 };
 
 static void nmi_cpu_restore_registers(struct op_msrs *msrs)
diff --git a/arch/x86/oprofile/nmi_timer_int.c b/arch/x86/oprofile/nmi_timer_int.c
index fb40645..720bf5a 100644
--- a/arch/x86/oprofile/nmi_timer_int.c
+++ b/arch/x86/oprofile/nmi_timer_int.c
@@ -25,7 +25,7 @@ static int profile_timer_exceptions_notify(struct notifier_block *self,
 	int ret = NOTIFY_DONE;
 
 	switch (val) {
-	case DIE_NMI_IPI:
+	case DIE_NMI:
 		oprofile_add_sample(args->regs, 0);
 		ret = NOTIFY_STOP;
 		break;
@@ -38,7 +38,7 @@ static int profile_timer_exceptions_notify(struct notifier_block *self,
 static struct notifier_block profile_timer_exceptions_nb = {
 	.notifier_call = profile_timer_exceptions_notify,
 	.next = NULL,
-	.priority = 0
+	.priority = NMI_LOW_PRIOR,
 };
 
 static int timer_start(void)
-- 
1.7.3.2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 5/9] x86, NMI: Allow NMI reason io port (0x61) to be processed on any CPU
  2010-11-30 22:27 [V3 PATCH 0/9] x86, NMI: give NMI handler a face-lift Don Zickus
                   ` (3 preceding siblings ...)
  2010-11-30 22:27 ` [PATCH 4/9] x86, NMI: Remove DIE_NMI_IPI and add priorties to handlers Don Zickus
@ 2010-11-30 22:27 ` Don Zickus
  2010-11-30 22:27 ` [PATCH 6/9] x86: only call smp_processor_id in non-preempt cases Don Zickus
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 17+ messages in thread
From: Don Zickus @ 2010-11-30 22:27 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, Robert Richter, ying.huang, Andi Kleen, gorcunov,
	LKML, Don Zickus

From: Huang Ying <ying.huang@intel.com>

In original NMI handler, NMI reason io port (0x61) is only processed
on BSP. This makes it impossible to hot-remove BSP. To solve the
issue, a raw spinlock is used to make the port can be processed on any
CPU.

[ added a unlock and return to NMI_REASON_MASK path -dcz]

Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Don Zickus <dzickus@redhat.com>

---
 arch/x86/kernel/traps.c |   38 +++++++++++++++++++++-----------------
 1 files changed, 21 insertions(+), 17 deletions(-)

diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 0668b51..1c21271 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -84,6 +84,11 @@ EXPORT_SYMBOL_GPL(used_vectors);
 static int ignore_nmis;
 
 int unknown_nmi_panic;
+/*
+ * Prevent NMI reason port (0x61) being accessed simultaneously, can
+ * only be used in NMI handler.
+ */
+static DEFINE_RAW_SPINLOCK(nmi_reason_lock);
 
 static inline void conditional_sti(struct pt_regs *regs)
 {
@@ -392,7 +397,6 @@ unknown_nmi_error(unsigned char reason, struct pt_regs *regs)
 static notrace __kprobes void default_do_nmi(struct pt_regs *regs)
 {
 	unsigned char reason = 0;
-	int cpu;
 
 	/*
 	 * CPU-specific NMI must be processed before non-CPU-specific
@@ -408,25 +412,25 @@ static notrace __kprobes void default_do_nmi(struct pt_regs *regs)
 		return;
 
 	/* Non-CPU-specific NMI: NMI sources can be processed on any CPU */
-	cpu = smp_processor_id();
-	/* Only the BSP gets external NMIs from the system. */
-	if (!cpu) {
-		reason = get_nmi_reason();
-		if (reason & NMI_REASON_MASK) {
-			if (reason & NMI_REASON_SERR)
-				pci_serr_error(reason, regs);
-			else if (reason & NMI_REASON_IOCHK)
-				io_check_error(reason, regs);
+	raw_spin_lock(&nmi_reason_lock);
+	reason = get_nmi_reason();
+	if (reason & NMI_REASON_MASK) {
+		if (reason & NMI_REASON_SERR)
+			pci_serr_error(reason, regs);
+		else if (reason & NMI_REASON_IOCHK)
+			io_check_error(reason, regs);
 #ifdef CONFIG_X86_32
-			/*
-			 * Reassert NMI in case it became active
-			 * meanwhile as it's edge-triggered:
-			 */
-			reassert_nmi();
+		/*
+		 * Reassert NMI in case it became active
+		 * meanwhile as it's edge-triggered:
+		 */
+		reassert_nmi();
 #endif
-			return;
-		}
+		raw_spin_unlock(&nmi_reason_lock);
+		return;
 	}
+	raw_spin_unlock(&nmi_reason_lock);
+
 	unknown_nmi_error(reason, regs);
 }
 
-- 
1.7.3.2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 6/9] x86: only call smp_processor_id in non-preempt cases
  2010-11-30 22:27 [V3 PATCH 0/9] x86, NMI: give NMI handler a face-lift Don Zickus
                   ` (4 preceding siblings ...)
  2010-11-30 22:27 ` [PATCH 5/9] x86, NMI: Allow NMI reason io port (0x61) to be processed on any CPU Don Zickus
@ 2010-11-30 22:27 ` Don Zickus
  2010-12-01 18:07   ` Cyrill Gorcunov
  2010-11-30 22:27 ` [PATCH 7/9] x86: Avoid calling arch_trigger_all_cpu_backtrace() at the same time Don Zickus
                   ` (3 subsequent siblings)
  9 siblings, 1 reply; 17+ messages in thread
From: Don Zickus @ 2010-11-30 22:27 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, Robert Richter, ying.huang, Andi Kleen, gorcunov,
	LKML, Don Zickus

There are some paths that walk the die_chain with preemption on.
Make sure we are in an NMI call before we start doing anything.

This was triggered by do_general_protection calling notify_die with
DIE_GPF.

Reported-by: Jan Kiszka <jan.kiszka@web.de>
Signed-off-by: Don Zickus <dzickus@redhat.com>
---
 arch/x86/kernel/apic/hw_nmi.c |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kernel/apic/hw_nmi.c b/arch/x86/kernel/apic/hw_nmi.c
index 82bff25..ddb29bf 100644
--- a/arch/x86/kernel/apic/hw_nmi.c
+++ b/arch/x86/kernel/apic/hw_nmi.c
@@ -59,7 +59,7 @@ arch_trigger_all_cpu_backtrace_handler(struct notifier_block *self,
 {
 	struct die_args *args = __args;
 	struct pt_regs *regs;
-	int cpu = smp_processor_id();
+	int cpu;
 
 	switch (cmd) {
 	case DIE_NMI:
@@ -70,6 +70,7 @@ arch_trigger_all_cpu_backtrace_handler(struct notifier_block *self,
 	}
 
 	regs = args->regs;
+	cpu = smp_processor_id();
 
 	if (cpumask_test_cpu(cpu, to_cpumask(backtrace_mask))) {
 		static arch_spinlock_t lock = __ARCH_SPIN_LOCK_UNLOCKED;
-- 
1.7.3.2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 7/9] x86: Avoid calling arch_trigger_all_cpu_backtrace() at the same time
  2010-11-30 22:27 [V3 PATCH 0/9] x86, NMI: give NMI handler a face-lift Don Zickus
                   ` (5 preceding siblings ...)
  2010-11-30 22:27 ` [PATCH 6/9] x86: only call smp_processor_id in non-preempt cases Don Zickus
@ 2010-11-30 22:27 ` Don Zickus
  2010-11-30 22:27 ` [PATCH 8/9] panic: ratelimit panic messages Don Zickus
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 17+ messages in thread
From: Don Zickus @ 2010-11-30 22:27 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, Robert Richter, ying.huang, Andi Kleen, gorcunov,
	LKML, Dongdong Deng, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	x86, Don Zickus

From: Dongdong Deng <dongdong.deng@windriver.com>

The spin_lock_debug/rcu_cpu_stall detector uses
trigger_all_cpu_backtrace() to dump cpu backtrace.
Therefore it is possible that trigger_all_cpu_backtrace()
could be called at the same time on different CPUs, which
triggers and 'unknown reason NMI' warning. The following case
illustrates the problem:

      CPU1                    CPU2                     ...   CPU N
                       trigger_all_cpu_backtrace()
                       set "backtrace_mask" to cpu mask
                               |
generate NMI interrupts  generate NMI interrupts       ...
    \                          |                               /
     \                         |                              /

The "backtrace_mask" will be cleaned by the first NMI interrupt
at nmi_watchdog_tick(), then the following NMI interrupts generated
by other cpus's arch_trigger_all_cpu_backtrace() will be taken as
unknown reason NMI interrupts.

This patch uses a test_and_set to avoid the problem, and stop the
arch_trigger_all_cpu_backtrace() from calling to avoid dumping a
double cpu backtrace info when there is already a
trigger_all_cpu_backtrace() in progress.

Signed-off-by: Dongdong Deng <dongdong.deng@windriver.com>
Reviewed-by: Bruce Ashfield <bruce.ashfield@windriver.com>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Ingo Molnar <mingo@redhat.com>
CC: "H. Peter Anvin" <hpa@zytor.com>
CC: x86@kernel.org
CC: linux-kernel@vger.kernel.org
Signed-off-by: Don Zickus <dzickus@redhat.com>
---
 arch/x86/kernel/apic/hw_nmi.c |   13 +++++++++++++
 1 files changed, 13 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kernel/apic/hw_nmi.c b/arch/x86/kernel/apic/hw_nmi.c
index ddb29bf..b36dd01 100644
--- a/arch/x86/kernel/apic/hw_nmi.c
+++ b/arch/x86/kernel/apic/hw_nmi.c
@@ -36,10 +36,20 @@ EXPORT_SYMBOL(touch_nmi_watchdog);
 /* For reliability, we're prepared to waste bits here. */
 static DECLARE_BITMAP(backtrace_mask, NR_CPUS) __read_mostly;
 
+/* "in progress" flag of arch_trigger_all_cpu_backtrace */
+static unsigned long backtrace_flag;
+
 void arch_trigger_all_cpu_backtrace(void)
 {
 	int i;
 
+	if (test_and_set_bit(0, &backtrace_flag))
+		/*
+		 * If there is already a trigger_all_cpu_backtrace() in progress
+		 * (backtrace_flag == 1), don't output double cpu dump infos.
+		 */
+		return;
+
 	cpumask_copy(to_cpumask(backtrace_mask), cpu_online_mask);
 
 	printk(KERN_INFO "sending NMI to all CPUs:\n");
@@ -51,6 +61,9 @@ void arch_trigger_all_cpu_backtrace(void)
 			break;
 		mdelay(1);
 	}
+
+	clear_bit(0, &backtrace_flag);
+	smp_mb__after_clear_bit();
 }
 
 static int __kprobes
-- 
1.7.3.2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 8/9] panic:  ratelimit panic messages
  2010-11-30 22:27 [V3 PATCH 0/9] x86, NMI: give NMI handler a face-lift Don Zickus
                   ` (6 preceding siblings ...)
  2010-11-30 22:27 ` [PATCH 7/9] x86: Avoid calling arch_trigger_all_cpu_backtrace() at the same time Don Zickus
@ 2010-11-30 22:27 ` Don Zickus
  2010-11-30 22:27 ` [PATCH 9/9] watchdog: touch_nmi_watchdog should only touch local cpu not every one Don Zickus
  2010-12-22  3:16 ` [V3 PATCH 0/9] x86, NMI: give NMI handler a face-lift Huang Ying
  9 siblings, 0 replies; 17+ messages in thread
From: Don Zickus @ 2010-11-30 22:27 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, Robert Richter, ying.huang, Andi Kleen, gorcunov,
	LKML, Don Zickus

Sometimes when things go bad, so much spew is coming on the console it is hard
to figure out what happened.  This patch allows you to ratelimit the panic
messages with the intent that the first panic message will provide the info
we need to figure out what happened.

Adds new kernel param 'panic_ratelimit=on/<integer in seconds>'

Signed-off-by: Don Zickus <dzickus@redhat.com>
---
 Documentation/kernel-parameters.txt |    6 ++++++
 kernel/panic.c                      |   30 ++++++++++++++++++++++++++++++
 2 files changed, 36 insertions(+), 0 deletions(-)

diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index 5e55e46..79d1df8 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -1807,6 +1807,12 @@ and is between 256 and 4096 characters. It is defined in the file
 	panic=		[KNL] Kernel behaviour on panic
 			Format: <timeout>
 
+	panic_ratelimit=	[KNL] ratelimit the panic messages
+			Useful for slowing down multiple panics to capture
+			the first one before it scrolls off the screen
+			Format: "on" or some integer in seconds
+			"on" defaults to 10 minutes
+
 	parkbd.port=	[HW] Parallel port number the keyboard adapter is
 			connected to, default is 0.
 			Format: <parport#>
diff --git a/kernel/panic.c b/kernel/panic.c
index 4c13b1a..fe89e04 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -23,6 +23,7 @@
 #include <linux/init.h>
 #include <linux/nmi.h>
 #include <linux/dmi.h>
+#include <linux/ratelimit.h>
 
 #define PANIC_TIMER_STEP 100
 #define PANIC_BLINK_SPD 18
@@ -48,6 +49,31 @@ static long no_blink(int state)
 long (*panic_blink)(int state);
 EXPORT_SYMBOL(panic_blink);
 
+/* setting default to 0 effectively disables it */
+DEFINE_RATELIMIT_STATE(panic_ratelimit_state, 0, 1);
+
+static int __init panic_ratelimit_setup(char *str)
+{
+	int interval;
+
+	if (!strncmp(str, "on", 2))
+		/* default to 10 minutes */
+		interval = 600 * HZ;
+	else
+		interval = simple_strtoul(str, NULL, 0) * HZ;
+
+	panic_ratelimit_state.interval = interval;
+	return 1;
+}
+__setup("panic_ratelimit=", panic_ratelimit_setup);
+
+static int __panic_ratelimit(const char *func)
+{
+	return ___ratelimit(&panic_ratelimit_state, func);
+}
+
+#define panic_ratelimit() __panic_ratelimit(__func__)
+
 /**
  *	panic - halt the system
  *	@fmt: The text string to print
@@ -63,6 +89,10 @@ NORET_TYPE void panic(const char * fmt, ...)
 	long i, i_next = 0;
 	int state = 0;
 
+	if (!panic_ratelimit())
+		for(;;)
+			cpu_relax();
+
 	/*
 	 * It's possible to come here directly from a panic-assertion and
 	 * not have preempt disabled. Some functions called from here want
-- 
1.7.3.2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 9/9] watchdog:  touch_nmi_watchdog should only touch local cpu not every one
  2010-11-30 22:27 [V3 PATCH 0/9] x86, NMI: give NMI handler a face-lift Don Zickus
                   ` (7 preceding siblings ...)
  2010-11-30 22:27 ` [PATCH 8/9] panic: ratelimit panic messages Don Zickus
@ 2010-11-30 22:27 ` Don Zickus
  2010-12-22  3:16 ` [V3 PATCH 0/9] x86, NMI: give NMI handler a face-lift Huang Ying
  9 siblings, 0 replies; 17+ messages in thread
From: Don Zickus @ 2010-11-30 22:27 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, Robert Richter, ying.huang, Andi Kleen, gorcunov,
	LKML, Don Zickus

I ran into a scenario where while one cpu was stuck and should have panic'd
because of the NMI watchdog, it didn't.  The reason was another cpu was spewing
stack dumps on to the console.  Upon investigation, I noticed that when writing
to the console and also when dumping the stack, the watchdog is touched.

This causes all the cpus to reset their NMI watchdog flags and the 'stuck' cpu
just spins forever.

This change causes the semantics of touch_nmi_watchdog to be changed slightly.
Previously, I accidentally changed the semantics and we noticed there was a
codepath in which touch_nmi_watchdog could be touched from a preemtible area.
That caused a BUG() to happen when CONFIG_DEBUG_PREEMPT was enabled.  I believe
it was the acpi code.

My attempt here re-introduces the change to have the touch_nmi_watchdog() code
only touch the local cpu instead of all of the cpus.  But instead of using
__get_cpu_var(), I use the __raw_get_cpu_var() version.

This avoids the preemption problem.  However my reasoning wasn't because I was
trying to be lazy.  Instead I rationalized it as, well if preemption is enabled
then interrupts should be enabled to and the NMI watchdog will have no reason
to trigger.  So it won't matter if the wrong cpu is touched because the percpu
interrupt counters the NMI watchdog uses should still be incrementing.

Signed-off-by: Don Zickus <dzickus@redhat.com>
---
 kernel/watchdog.c |   17 +++++++++--------
 1 files changed, 9 insertions(+), 8 deletions(-)

diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index e0f44dc..792a4ed 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -143,14 +143,15 @@ void touch_all_softlockup_watchdogs(void)
 #ifdef CONFIG_HARDLOCKUP_DETECTOR
 void touch_nmi_watchdog(void)
 {
-	if (watchdog_enabled) {
-		unsigned cpu;
+	/*
+	 * Using __raw here because some code paths have
+	 * preemption enabled.  If preemption is enabled
+	 * then interrupts should be enabled too, in which
+	 * case we shouldn't have to worry about the watchdog
+	 * going off.
+	 */
+	__raw_get_cpu_var(watchdog_nmi_touch) = true;
 
-		for_each_present_cpu(cpu) {
-			if (per_cpu(watchdog_nmi_touch, cpu) != true)
-				per_cpu(watchdog_nmi_touch, cpu) = true;
-		}
-	}
 	touch_softlockup_watchdog();
 }
 EXPORT_SYMBOL(touch_nmi_watchdog);
-- 
1.7.3.2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH 6/9] x86: only call smp_processor_id in non-preempt cases
  2010-11-30 22:27 ` [PATCH 6/9] x86: only call smp_processor_id in non-preempt cases Don Zickus
@ 2010-12-01 18:07   ` Cyrill Gorcunov
  0 siblings, 0 replies; 17+ messages in thread
From: Cyrill Gorcunov @ 2010-12-01 18:07 UTC (permalink / raw)
  To: Don Zickus
  Cc: Ingo Molnar, Peter Zijlstra, Robert Richter, ying.huang,
	Andi Kleen, LKML

On Tue, Nov 30, 2010 at 05:27:27PM -0500, Don Zickus wrote:
> There are some paths that walk the die_chain with preemption on.
> Make sure we are in an NMI call before we start doing anything.
> 
> This was triggered by do_general_protection calling notify_die with
> DIE_GPF.
> 
> Reported-by: Jan Kiszka <jan.kiszka@web.de>
> Signed-off-by: Don Zickus <dzickus@redhat.com>
> ---
>  arch/x86/kernel/apic/hw_nmi.c |    3 ++-
>  1 files changed, 2 insertions(+), 1 deletions(-)

yup, thanks Don!
 
  Cyrill

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 4/9] x86, NMI: Remove DIE_NMI_IPI and add priorties to handlers
  2010-11-30 22:27 ` [PATCH 4/9] x86, NMI: Remove DIE_NMI_IPI and add priorties to handlers Don Zickus
@ 2010-12-01 18:41   ` Cyrill Gorcunov
  2010-12-01 18:53     ` Peter Zijlstra
  2010-12-01 21:30     ` Andi Kleen
  0 siblings, 2 replies; 17+ messages in thread
From: Cyrill Gorcunov @ 2010-12-01 18:41 UTC (permalink / raw)
  To: Don Zickus
  Cc: Ingo Molnar, Peter Zijlstra, Robert Richter, ying.huang,
	Andi Kleen, LKML

On Tue, Nov 30, 2010 at 05:27:25PM -0500, Don Zickus wrote:
> When re-ordering how the NMI handles its callbacks, a conversation started
> asking what DIE_NMI_IPI meant.  No one could answer it.

It should have came from commit

 | commit c4b2bffee2a4115fed2825530f2b906ee2f17bd7
 | Author: Andi Kleen <ak@suse.de>
 | Date:   Fri Jan 23 18:46:40 2004 -0800
 |
 |   [PATCH] x86-64 merge
 |   
 |   Mainly lots of bug fixes and a few minor features. One change is that
 |   it uses drivers/Kconfig now like i386. This requires a few minor changes in
 |   outside Kconfig files which I am sending separately.
 ...

Andi do you remember what the initial idea was? Didn't find any user of it
even in this old commit. Just curious.

> 
> Noticing that is was wasteful to call the die_chain a second time with just
> another argument, DIE_NMI_IPI, it was decided to nuke it and add priorities
> to the die_chain handlers to maintain existing behaviour.
> 
> This patch replaces DIE_NMI_IPI with the appropriate option, mostly DIE_NMI.
> Then it adds priorities to those handlers, using a globally defined set of
> priorities for NMI.
> 
> The thought is eventually we will just switch the nmi handlers from the
> die_chain to something more nmi specific.
> 
> Signed-off-by: Don Zickus <dzickus@redhat.com>
> ---

 Don, maybe switching to say new chains like chain_perf and friends would be
more readable/clean? I'm not against this patch by any means, but just a thought ;)

 Ie I thought like

 default_do_nmi
   if (!(reason & 0xc0)) {
     if (notify_perf() == NOTIFY_STOP)
       return
     if (notify_die() == NOTIFY_STOP)
       return
    ...
   }

 Or there is something obvious I'm missing?

 Again, just a thought.

 Cyrill

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 4/9] x86, NMI: Remove DIE_NMI_IPI and add priorties to handlers
  2010-12-01 18:41   ` Cyrill Gorcunov
@ 2010-12-01 18:53     ` Peter Zijlstra
  2010-12-01 19:01       ` Cyrill Gorcunov
  2010-12-01 21:30     ` Andi Kleen
  1 sibling, 1 reply; 17+ messages in thread
From: Peter Zijlstra @ 2010-12-01 18:53 UTC (permalink / raw)
  To: Cyrill Gorcunov
  Cc: Don Zickus, Ingo Molnar, Robert Richter, ying.huang, Andi Kleen,
	LKML

On Wed, 2010-12-01 at 21:41 +0300, Cyrill Gorcunov wrote:
> On Tue, Nov 30, 2010 at 05:27:25PM -0500, Don Zickus wrote:
> > When re-ordering how the NMI handles its callbacks, a conversation started
> > asking what DIE_NMI_IPI meant.  No one could answer it.
> 
> It should have came from commit
> 
>  | commit c4b2bffee2a4115fed2825530f2b906ee2f17bd7
>  | Author: Andi Kleen <ak@suse.de>
>  | Date:   Fri Jan 23 18:46:40 2004 -0800
>  |
>  |   [PATCH] x86-64 merge
>  |   
>  |   Mainly lots of bug fixes and a few minor features. One change is that
>  |   it uses drivers/Kconfig now like i386. This requires a few minor changes in
>  |   outside Kconfig files which I am sending separately.
>  ...
> 
> Andi do you remember what the initial idea was? Didn't find any user of it
> even in this old commit. Just curious.
> 
> > 
> > Noticing that is was wasteful to call the die_chain a second time with just
> > another argument, DIE_NMI_IPI, it was decided to nuke it and add priorities
> > to the die_chain handlers to maintain existing behaviour.
> > 
> > This patch replaces DIE_NMI_IPI with the appropriate option, mostly DIE_NMI.
> > Then it adds priorities to those handlers, using a globally defined set of
> > priorities for NMI.
> > 
> > The thought is eventually we will just switch the nmi handlers from the
> > die_chain to something more nmi specific.
> > 
> > Signed-off-by: Don Zickus <dzickus@redhat.com>
> > ---
> 
>  Don, maybe switching to say new chains like chain_perf and friends would be
> more readable/clean? I'm not against this patch by any means, but just a thought ;)

Its a single event (NMI) so we only need a single notifier list, the
current one seems to be just fine.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 4/9] x86, NMI: Remove DIE_NMI_IPI and add priorties to handlers
  2010-12-01 18:53     ` Peter Zijlstra
@ 2010-12-01 19:01       ` Cyrill Gorcunov
  0 siblings, 0 replies; 17+ messages in thread
From: Cyrill Gorcunov @ 2010-12-01 19:01 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Don Zickus, Ingo Molnar, Robert Richter, ying.huang, Andi Kleen,
	LKML

On Wed, Dec 01, 2010 at 07:53:42PM +0100, Peter Zijlstra wrote:
...
> > 
> >  Don, maybe switching to say new chains like chain_perf and friends would be
> > more readable/clean? I'm not against this patch by any means, but just a thought ;)
> 
> Its a single event (NMI) so we only need a single notifier list, the
> current one seems to be just fine.
> 

 OK, i'm fine with either, just a thought

  Cyrill

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 4/9] x86, NMI: Remove DIE_NMI_IPI and add priorties to handlers
  2010-12-01 18:41   ` Cyrill Gorcunov
  2010-12-01 18:53     ` Peter Zijlstra
@ 2010-12-01 21:30     ` Andi Kleen
  2010-12-01 21:35       ` Cyrill Gorcunov
  1 sibling, 1 reply; 17+ messages in thread
From: Andi Kleen @ 2010-12-01 21:30 UTC (permalink / raw)
  To: Cyrill Gorcunov
  Cc: Don Zickus, Ingo Molnar, Peter Zijlstra, Robert Richter,
	ying.huang, Andi Kleen, LKML

On Wed, Dec 01, 2010 at 09:41:28PM +0300, Cyrill Gorcunov wrote:
> On Tue, Nov 30, 2010 at 05:27:25PM -0500, Don Zickus wrote:
> > When re-ordering how the NMI handles its callbacks, a conversation started
> > asking what DIE_NMI_IPI meant.  No one could answer it.
> 
> It should have came from commit
> 
>  | commit c4b2bffee2a4115fed2825530f2b906ee2f17bd7
>  | Author: Andi Kleen <ak@suse.de>
>  | Date:   Fri Jan 23 18:46:40 2004 -0800
>  |
>  |   [PATCH] x86-64 merge
>  |   
>  |   Mainly lots of bug fixes and a few minor features. One change is that
>  |   it uses drivers/Kconfig now like i386. This requires a few minor changes in
>  |   outside Kconfig files which I am sending separately.
>  ...
> 
> Andi do you remember what the initial idea was? Didn't find any user of it
> even in this old commit. Just curious.

The original die names were pretty much a 1:1 conversion of the hooks
used by both the external KDB and KGDB patchkits floating around
at that time.

IIRC DIE_NMI_IPI was the one that was early in the NMI handler
and DIE_NMI late when everything else failed.

So you could use NMI_IPI when you just wanted to stop
all CPUs with a broadcast NMI and can check that reliable
through some memory location, and NMI when you wanted
to drop into the debugger as a last resort.


-Andi

-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 4/9] x86, NMI: Remove DIE_NMI_IPI and add priorties to handlers
  2010-12-01 21:30     ` Andi Kleen
@ 2010-12-01 21:35       ` Cyrill Gorcunov
  0 siblings, 0 replies; 17+ messages in thread
From: Cyrill Gorcunov @ 2010-12-01 21:35 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Don Zickus, Ingo Molnar, Peter Zijlstra, Robert Richter,
	ying.huang, LKML

On Wed, Dec 01, 2010 at 10:30:34PM +0100, Andi Kleen wrote:
> On Wed, Dec 01, 2010 at 09:41:28PM +0300, Cyrill Gorcunov wrote:
> > On Tue, Nov 30, 2010 at 05:27:25PM -0500, Don Zickus wrote:
> > > When re-ordering how the NMI handles its callbacks, a conversation started
> > > asking what DIE_NMI_IPI meant.  No one could answer it.
> > 
...
> > Andi do you remember what the initial idea was? Didn't find any user of it
> > even in this old commit. Just curious.
> 
> The original die names were pretty much a 1:1 conversion of the hooks
> used by both the external KDB and KGDB patchkits floating around
> at that time.
> 
> IIRC DIE_NMI_IPI was the one that was early in the NMI handler
> and DIE_NMI late when everything else failed.
> 
> So you could use NMI_IPI when you just wanted to stop
> all CPUs with a broadcast NMI and can check that reliable
> through some memory location, and NMI when you wanted
> to drop into the debugger as a last resort.
> 
> 
> -Andi
>

Thanks for info Andi, good to know.
 
  Cyrill

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [V3 PATCH 0/9] x86, NMI: give NMI handler a face-lift
  2010-11-30 22:27 [V3 PATCH 0/9] x86, NMI: give NMI handler a face-lift Don Zickus
                   ` (8 preceding siblings ...)
  2010-11-30 22:27 ` [PATCH 9/9] watchdog: touch_nmi_watchdog should only touch local cpu not every one Don Zickus
@ 2010-12-22  3:16 ` Huang Ying
  9 siblings, 0 replies; 17+ messages in thread
From: Huang Ying @ 2010-12-22  3:16 UTC (permalink / raw)
  To: Don Zickus
  Cc: Ingo Molnar, Peter Zijlstra, Robert Richter, Andi Kleen,
	gorcunov@gmail.com, LKML

Hi, Don,

It seems that this patchset has not been merged yet.  Give it a retry?
Busy on watchdog issue?

Best Regards,
Huang Ying

On Wed, 2010-12-01 at 06:27 +0800, Don Zickus wrote:
> Restructuring the nmi handler to be more readable and simpler.
> 
> This is just laying the ground work for future improvements in this area.
> 
> I also left out one of Huang's patch until we figure out how we are going
> to proceed with a new notifier.
> 
> Tested 32-bit and 64-bit on AMD and Intel machines.
> 
> V3:  rebased on top of removal of nmi watchdog
>      add a couple more watchdog patches from my queue
> 
> V2:  add a patch to kill DIE_NMI_IPI and add in priorities
> 
> Cheers,
> Don
> 
> Don Zickus (5):
>   x86, NMI: Remove DIE_NMI_IPI and add priorties to handlers
>   x86: only call smp_processor_id in non-preempt cases
>   panic:  ratelimit panic messages
>   watchdog:  touch_nmi_watchdog should only touch local cpu not every
>     one
> 
> Dongdong Deng (1):
>   x86: Avoid calling arch_trigger_all_cpu_backtrace() at the same time
> 
> Huang Ying (3):
>   x86, NMI: Add NMI symbol constants and rename memory parity to PCI
>     SERR
>   x86, NMI: Add touch_nmi_watchdog to io_check_error delay
>   x86, NMI: Rewrite NMI handler
>   x86, NMI: Allow NMI reason io port (0x61) to be processed on any CPU
> 
>  Documentation/kernel-parameters.txt     |    6 ++
>  arch/x86/include/asm/kdebug.h           |    1 -
>  arch/x86/include/asm/mach_traps.h       |   12 +++-
>  arch/x86/include/asm/nmi.h              |   20 ++++++
>  arch/x86/kernel/apic/hw_nmi.c           |   19 ++++-
>  arch/x86/kernel/apic/x2apic_uv_x.c      |    2 +-
>  arch/x86/kernel/cpu/mcheck/mce-inject.c |    5 +-
>  arch/x86/kernel/cpu/perf_event.c        |    3 +-
>  arch/x86/kernel/kgdb.c                  |    6 +--
>  arch/x86/kernel/reboot.c                |    5 +-
>  arch/x86/kernel/traps.c                 |  114 ++++++++++++++++---------------
>  arch/x86/oprofile/nmi_int.c             |    3 +-
>  arch/x86/oprofile/nmi_timer_int.c       |    2 +-
>  drivers/char/ipmi/ipmi_watchdog.c       |    2 +-
>  drivers/watchdog/hpwdt.c                |    2 +-
>  kernel/panic.c                          |   30 ++++++++
>  kernel/watchdog.c                       |   17 +++--
>  17 files changed, 166 insertions(+), 83 deletions(-)
> 



^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2010-12-22  3:16 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-11-30 22:27 [V3 PATCH 0/9] x86, NMI: give NMI handler a face-lift Don Zickus
2010-11-30 22:27 ` [PATCH 1/9] x86, NMI: Add NMI symbol constants and rename memory parity to PCI SERR Don Zickus
2010-11-30 22:27 ` [PATCH 2/9] x86, NMI: Add touch_nmi_watchdog to io_check_error delay Don Zickus
2010-11-30 22:27 ` [PATCH 3/9] x86, NMI: Rewrite NMI handler Don Zickus
2010-11-30 22:27 ` [PATCH 4/9] x86, NMI: Remove DIE_NMI_IPI and add priorties to handlers Don Zickus
2010-12-01 18:41   ` Cyrill Gorcunov
2010-12-01 18:53     ` Peter Zijlstra
2010-12-01 19:01       ` Cyrill Gorcunov
2010-12-01 21:30     ` Andi Kleen
2010-12-01 21:35       ` Cyrill Gorcunov
2010-11-30 22:27 ` [PATCH 5/9] x86, NMI: Allow NMI reason io port (0x61) to be processed on any CPU Don Zickus
2010-11-30 22:27 ` [PATCH 6/9] x86: only call smp_processor_id in non-preempt cases Don Zickus
2010-12-01 18:07   ` Cyrill Gorcunov
2010-11-30 22:27 ` [PATCH 7/9] x86: Avoid calling arch_trigger_all_cpu_backtrace() at the same time Don Zickus
2010-11-30 22:27 ` [PATCH 8/9] panic: ratelimit panic messages Don Zickus
2010-11-30 22:27 ` [PATCH 9/9] watchdog: touch_nmi_watchdog should only touch local cpu not every one Don Zickus
2010-12-22  3:16 ` [V3 PATCH 0/9] x86, NMI: give NMI handler a face-lift Huang Ying

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).