public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] x86, perf_p4:  block PMIs on init to prevent a stream of unkown NMIs
@ 2014-01-17 15:41 Don Zickus
  2014-01-17 17:11 ` Cyrill Gorcunov
  2014-01-20  8:38 ` Peter Zijlstra
  0 siblings, 2 replies; 5+ messages in thread
From: Don Zickus @ 2014-01-17 15:41 UTC (permalink / raw)
  To: LKML; +Cc: Don Zickus, Dave Young, Vivek Goyal, Cyrill Gorcunov,
	Peter Zijlstra

A bunch of unknown NMIs have popped up on a Pentium4 recently when booting
into a kdump kernel.  This was exposed because the watchdog timer went
from 60 seconds down to 10 seconds (increasing the ability to reproduce
this problem).

What is happening is on boot up of the second kernel (the kdump one),
the previous nmi_watchdogs were enabled on thread 0 and thread 1.  The
second kernel only initializes one cpu but the perf counter on thread 1
still counts.

Normally in a kdump scenario, the other cpus are blocking in an NMI loop,
but more importantly their local apics have the performance counters disabled
(iow LVTPC is masked).  So any counters that fire are masked and never get
through to the second kernel.

However, on a P4 the local apic is shared by both threads and thread1's PMI
(despite being configured to only interrupt thread1) will generate an NMI on
thread0.  Because thread0 knows nothing about this NMI, it is seen as an
unknown NMI.

This would be fine because it is a kdump kernel, strange things happen
what is the big deal about a single unknown NMI.

Unfortunately, the P4 comes with another quirk: clearing the overflow bit
to prevent a stream of NMIs.  This is the problem.

The kdump kernel can not execute because of the endless NMIs that happen.

To solve this, I instrumented the p4 perf init code, to walk all the counters
and explicitly disable any overflow bits, but more importantly disable the
ability for the counters to generate a PMI.

Now when the counters go off, they do not generate anything and no unknown
NMIs are seen.

I could have removed the ENABLE bit too, but was worried it would impact
BIOS vendors secret ability to monitor cpu states.  I figured the ability to
generate a PMI or not is not interesting to them and chose that route instead.

I tested this on a P4 we have in our lab.  After two or three crashes, I could
normally reproduce the problem.  Now after 10 crashes, everything continues
to boot correctly.

Cc: Dave Young <dyoung@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Don Zickus <dzickus@redhat.com>
---
 arch/x86/kernel/cpu/perf_event_p4.c |   26 ++++++++++++++++++++++++++
 1 files changed, 26 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event_p4.c b/arch/x86/kernel/cpu/perf_event_p4.c
index 3486e66..cff30ab 100644
--- a/arch/x86/kernel/cpu/perf_event_p4.c
+++ b/arch/x86/kernel/cpu/perf_event_p4.c
@@ -1322,6 +1322,8 @@ static __initconst const struct x86_pmu p4_pmu = {
 __init int p4_pmu_init(void)
 {
 	unsigned int low, high;
+	u64 val;
+	int i, reg;
 
 	/* If we get stripped -- indexing fails */
 	BUILD_BUG_ON(ARCH_P4_MAX_CCCR > INTEL_PMC_MAX_GENERIC);
@@ -1340,5 +1342,29 @@ __init int p4_pmu_init(void)
 
 	x86_pmu = p4_pmu;
 
+	/*
+	 * Even though the counters are configured to interrupt a particular
+	 * logical processor when an overflow happens, testing has shown that
+	 * on kdump kernels (which uses a single cpu), thread1's counter
+	 * continues to run and will report an NMI on thread0.  Due to the
+	 * overflow bug, this leads to a stream of unknown NMIs.
+	 *
+	 * Solve this by disabling all counter's ability to generate a PMI.
+	 * Disabling the ENABLE bit would work too, but I was afraid that would
+	 * cause problems with BIOS vendors that secretly use the PMUs for data
+	 * analysis.  So keep the ENABLE bit on, but prevent PMIs from
+	 * happening.
+	 *
+	 * The clearing of the overflow is to prevent the scenario where an
+	 * overflow happened before the second kernel came up and the second
+	 * kernel blindly does an apic_write(LVTPC, APIC_DM_NMI), again causing
+	 * a stream of endless unknown NMIs.
+	 */
+        for (i = 0; i < x86_pmu.num_counters; i++) {
+		reg = x86_pmu_config_addr(i);
+		rdmsrl_safe(reg, &val);
+		wrmsrl_safe(reg, val & ~(P4_CCCR_OVF|P4_CCCR_OVF_PMI_T0|P4_CCCR_OVF_PMI_T1));
+        }
+
 	return 0;
 }
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] x86, perf_p4:  block PMIs on init to prevent a stream of unkown NMIs
  2014-01-17 15:41 [PATCH] x86, perf_p4: block PMIs on init to prevent a stream of unkown NMIs Don Zickus
@ 2014-01-17 17:11 ` Cyrill Gorcunov
  2014-01-20  8:38 ` Peter Zijlstra
  1 sibling, 0 replies; 5+ messages in thread
From: Cyrill Gorcunov @ 2014-01-17 17:11 UTC (permalink / raw)
  To: Don Zickus; +Cc: LKML, Dave Young, Vivek Goyal, Peter Zijlstra

On Fri, Jan 17, 2014 at 10:41:41AM -0500, Don Zickus wrote:
> I tested this on a P4 we have in our lab.  After two or three crashes, I could
> normally reproduce the problem.  Now after 10 crashes, everything continues
> to boot correctly.
> 
> Cc: Dave Young <dyoung@redhat.com>
> Cc: Vivek Goyal <vgoyal@redhat.com>
> Cc: Cyrill Gorcunov <gorcunov@openvz.org>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Signed-off-by: Don Zickus <dzickus@redhat.com>

Looks good to me, thanks a lot Don!

Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] x86, perf_p4:  block PMIs on init to prevent a stream of unkown NMIs
  2014-01-17 15:41 [PATCH] x86, perf_p4: block PMIs on init to prevent a stream of unkown NMIs Don Zickus
  2014-01-17 17:11 ` Cyrill Gorcunov
@ 2014-01-20  8:38 ` Peter Zijlstra
  2014-01-20 15:41   ` Don Zickus
  1 sibling, 1 reply; 5+ messages in thread
From: Peter Zijlstra @ 2014-01-20  8:38 UTC (permalink / raw)
  To: Don Zickus; +Cc: LKML, Dave Young, Vivek Goyal, Cyrill Gorcunov

On Fri, Jan 17, 2014 at 10:41:41AM -0500, Don Zickus wrote:
> I could have removed the ENABLE bit too, but was worried it would impact
> BIOS vendors secret ability to monitor cpu states.  I figured the ability to
> generate a PMI or not is not interesting to them and chose that route instead.


You worry about the wrong things, just clear the things.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] x86, perf_p4:  block PMIs on init to prevent a stream of unkown NMIs
  2014-01-20  8:38 ` Peter Zijlstra
@ 2014-01-20 15:41   ` Don Zickus
  2014-02-10 13:29     ` [tip:perf/core] perf/x86/p4: Block " tip-bot for Don Zickus
  0 siblings, 1 reply; 5+ messages in thread
From: Don Zickus @ 2014-01-20 15:41 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: LKML, Dave Young, Vivek Goyal, Cyrill Gorcunov

On Mon, Jan 20, 2014 at 09:38:59AM +0100, Peter Zijlstra wrote:
> On Fri, Jan 17, 2014 at 10:41:41AM -0500, Don Zickus wrote:
> > I could have removed the ENABLE bit too, but was worried it would impact
> > BIOS vendors secret ability to monitor cpu states.  I figured the ability to
> > generate a PMI or not is not interesting to them and chose that route instead.
> 
> 
> You worry about the wrong things, just clear the things.

Like this?

Cheers,
Don

----------------------8<-------------
From: Don Zickus <dzickus@redhat.com>
Date: Fri, 17 Jan 2014 10:23:53 -0500
Subject: [PATCH v2] x86, perf_p4:  block PMIs on init to prevent a stream of unkown NMIs

A bunch of unknown NMIs have popped up on a Pentium4 recently when booting
into a kdump kernel.  This was exposed because the watchdog timer went
from 60 seconds down to 10 seconds (increasing the ability to reproduce
this problem).

What is happening is on boot up of the second kernel (the kdump one),
the previous nmi_watchdogs were enabled on thread 0 and thread 1.  The
second kernel only initializes one cpu but the perf counter on thread 1
still counts.

Normally in a kdump scenario, the other cpus are blocking in an NMI loop,
but more importantly their local apics have the performance counters disabled
(iow LVTPC is masked).  So any counters that fire are masked and never get
through to the second kernel.

However, on a P4 the local apic is shared by both threads and thread1's PMI
(despite being configured to only interrupt thread1) will generate an NMI on
thread0.  Because thread0 knows nothing about this NMI, it is seen as an
unknown NMI.

This would be fine because it is a kdump kernel, strange things happen
what is the big deal about a single unknown NMI.

Unfortunately, the P4 comes with another quirk: clearing the overflow bit
to prevent a stream of NMIs.  This is the problem.

The kdump kernel can not execute because of the endless NMIs that happen.

To solve this, I instrumented the p4 perf init code, to walk all the counters
and zero them out (just like a normal reset would).

Now when the counters go off, they do not generate anything and no unknown
NMIs are seen.

I tested this on a P4 we have in our lab.  After two or three crashes, I could
normally reproduce the problem.  Now after 10 crashes, everything continues
to boot correctly.

Cc: Dave Young <dyoung@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Don Zickus <dzickus@redhat.com>
---
 arch/x86/kernel/cpu/perf_event_p4.c |   15 +++++++++++++++
 1 files changed, 15 insertions(+), 0 deletions(-)

V2 - zero out the register per Peter's suggestion.

diff --git a/arch/x86/kernel/cpu/perf_event_p4.c b/arch/x86/kernel/cpu/perf_event_p4.c
index 3486e66..075f18c 100644
--- a/arch/x86/kernel/cpu/perf_event_p4.c
+++ b/arch/x86/kernel/cpu/perf_event_p4.c
@@ -1322,6 +1322,7 @@ static __initconst const struct x86_pmu p4_pmu = {
 __init int p4_pmu_init(void)
 {
 	unsigned int low, high;
+	int i, reg;
 
 	/* If we get stripped -- indexing fails */
 	BUILD_BUG_ON(ARCH_P4_MAX_CCCR > INTEL_PMC_MAX_GENERIC);
@@ -1340,5 +1341,19 @@ __init int p4_pmu_init(void)
 
 	x86_pmu = p4_pmu;
 
+	/*
+	 * Even though the counters are configured to interrupt a particular
+	 * logical processor when an overflow happens, testing has shown that
+	 * on kdump kernels (which uses a single cpu), thread1's counter
+	 * continues to run and will report an NMI on thread0.  Due to the
+	 * overflow bug, this leads to a stream of unknown NMIs.
+	 *
+	 * Solve this by zero'ing out the registers to mimic a reset.
+	 */
+        for (i = 0; i < x86_pmu.num_counters; i++) {
+		reg = x86_pmu_config_addr(i);
+		wrmsrl_safe(reg, 0ULL);
+        }
+
 	return 0;
 }
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [tip:perf/core] perf/x86/p4: Block PMIs on init to prevent a stream of unkown NMIs
  2014-01-20 15:41   ` Don Zickus
@ 2014-02-10 13:29     ` tip-bot for Don Zickus
  0 siblings, 0 replies; 5+ messages in thread
From: tip-bot for Don Zickus @ 2014-02-10 13:29 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hpa, mingo, gorcunov, peterz, dyoung, vgoyal, tglx,
	dzickus

Commit-ID:  90ed5b0fa5eb96e1cbb34aebf6a9ed96ee1587ec
Gitweb:     http://git.kernel.org/tip/90ed5b0fa5eb96e1cbb34aebf6a9ed96ee1587ec
Author:     Don Zickus <dzickus@redhat.com>
AuthorDate: Sun, 9 Feb 2014 13:20:18 +0100
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Sun, 9 Feb 2014 13:20:35 +0100

perf/x86/p4: Block PMIs on init to prevent a stream of unkown NMIs

A bunch of unknown NMIs have popped up on a Pentium4 recently when booting
into a kdump kernel.  This was exposed because the watchdog timer went
from 60 seconds down to 10 seconds (increasing the ability to reproduce
this problem).

What is happening is on boot up of the second kernel (the kdump one),
the previous nmi_watchdogs were enabled on thread 0 and thread 1.  The
second kernel only initializes one cpu but the perf counter on thread 1
still counts.

Normally in a kdump scenario, the other cpus are blocking in an NMI loop,
but more importantly their local apics have the performance counters disabled
(iow LVTPC is masked).  So any counters that fire are masked and never get
through to the second kernel.

However, on a P4 the local apic is shared by both threads and thread1's PMI
(despite being configured to only interrupt thread1) will generate an NMI on
thread0.  Because thread0 knows nothing about this NMI, it is seen as an
unknown NMI.

This would be fine because it is a kdump kernel, strange things happen
what is the big deal about a single unknown NMI.

Unfortunately, the P4 comes with another quirk: clearing the overflow bit
to prevent a stream of NMIs.  This is the problem.

The kdump kernel can not execute because of the endless NMIs that happen.

To solve this, I instrumented the p4 perf init code, to walk all the counters
and zero them out (just like a normal reset would).

Now when the counters go off, they do not generate anything and no unknown
NMIs are seen.

I tested this on a P4 we have in our lab.  After two or three crashes, I could
normally reproduce the problem.  Now after 10 crashes, everything continues
to boot correctly.

Signed-off-by: Don Zickus <dzickus@redhat.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20140120154115.GZ25953@redhat.com
[ Fixed a stylistic detail. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/kernel/cpu/perf_event_p4.c | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/arch/x86/kernel/cpu/perf_event_p4.c b/arch/x86/kernel/cpu/perf_event_p4.c
index f44c34d..5d466b7 100644
--- a/arch/x86/kernel/cpu/perf_event_p4.c
+++ b/arch/x86/kernel/cpu/perf_event_p4.c
@@ -1339,6 +1339,7 @@ static __initconst const struct x86_pmu p4_pmu = {
 __init int p4_pmu_init(void)
 {
 	unsigned int low, high;
+	int i, reg;
 
 	/* If we get stripped -- indexing fails */
 	BUILD_BUG_ON(ARCH_P4_MAX_CCCR > INTEL_PMC_MAX_GENERIC);
@@ -1357,5 +1358,19 @@ __init int p4_pmu_init(void)
 
 	x86_pmu = p4_pmu;
 
+	/*
+	 * Even though the counters are configured to interrupt a particular
+	 * logical processor when an overflow happens, testing has shown that
+	 * on kdump kernels (which uses a single cpu), thread1's counter
+	 * continues to run and will report an NMI on thread0.  Due to the
+	 * overflow bug, this leads to a stream of unknown NMIs.
+	 *
+	 * Solve this by zero'ing out the registers to mimic a reset.
+	 */
+	for (i = 0; i < x86_pmu.num_counters; i++) {
+		reg = x86_pmu_config_addr(i);
+		wrmsrl_safe(reg, 0ULL);
+	}
+
 	return 0;
 }

^ permalink raw reply related	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2014-02-10 13:30 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-01-17 15:41 [PATCH] x86, perf_p4: block PMIs on init to prevent a stream of unkown NMIs Don Zickus
2014-01-17 17:11 ` Cyrill Gorcunov
2014-01-20  8:38 ` Peter Zijlstra
2014-01-20 15:41   ` Don Zickus
2014-02-10 13:29     ` [tip:perf/core] perf/x86/p4: Block " tip-bot for Don Zickus

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox