public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] perf, nmi: Move LVT un-masking into irq handlers
@ 2011-04-18 16:56 Don Zickus
  2011-04-20 14:37 ` Cyrill Gorcunov
  0 siblings, 1 reply; 7+ messages in thread
From: Don Zickus @ 2011-04-18 16:56 UTC (permalink / raw)
  To: x86; +Cc: Peter Zijlstra, Robert Richter, LKML, Don Zickus, Cyrill Gorcunov

It was noticed that P4 machines were generating double NMIs for each
perf event.  These extra NMIs lead to 'Dazed and confused' messages on
the screen.

I tracked this down to a P4 quirk that said the overflow bit had to be
cleared before re-enabling the apic LVT mask.  My first attempt was
to move the un-masking inside the perf nmi handler from before the
chipset NMI handler to after.

This broke Nehalem boxes that seem to like the unmasking before the
counters themselves are re-enabled.

In order to keep this change simple for 2.6.39, I decided to just
simply move the apic LVT un-masking to the beginning of all the
chipset NMI handlers, with the exeption of Pentium4's to fix the
double NMI issue.

Later on we can move the un-masking to later in the handlers to save
a number of 'extra' NMIs on those particular chipsets.

I tested this change on a P4 machine, an AMD machine, a Nehalem box,
and a core2quad box.  'perf top' worked correctly along with various
other small 'perf record' runs.  Anything high stress breaks all the
machines but that is a different problem.

Thanks to various people for testing different versions of this patch.

Reported-and-tested-by: Shaun Ruffell <sruffell@digium.com>
Signed-off-by: Don Zickus <dzickus@redhat.com>
CC: Cyrill Gorcunov <gorcunov@gmail.com>
---
 arch/x86/kernel/cpu/perf_event.c       |    5 +++--
 arch/x86/kernel/cpu/perf_event_intel.c |    3 +++
 arch/x86/kernel/cpu/perf_event_p4.c    |   14 ++++++++++----
 3 files changed, 16 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
index eed3673a..97c6c44 100644
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -1284,6 +1284,9 @@ static int x86_pmu_handle_irq(struct pt_regs *regs)
 
 	cpuc = &__get_cpu_var(cpu_hw_events);
 
+	/* chipsets have their own quirks when to unmask */
+	apic_write(APIC_LVTPC, APIC_DM_NMI);
+
 	for (idx = 0; idx < x86_pmu.num_counters; idx++) {
 		if (!test_bit(idx, cpuc->active_mask)) {
 			/*
@@ -1370,8 +1373,6 @@ perf_event_nmi_handler(struct notifier_block *self,
 		return NOTIFY_DONE;
 	}
 
-	apic_write(APIC_LVTPC, APIC_DM_NMI);
-
 	handled = x86_pmu.handle_irq(args->regs);
 	if (!handled)
 		return NOTIFY_DONE;
diff --git a/arch/x86/kernel/cpu/perf_event_intel.c b/arch/x86/kernel/cpu/perf_event_intel.c
index 8fc2b2c..d2326e1 100644
--- a/arch/x86/kernel/cpu/perf_event_intel.c
+++ b/arch/x86/kernel/cpu/perf_event_intel.c
@@ -933,6 +933,9 @@ static int intel_pmu_handle_irq(struct pt_regs *regs)
 
 	cpuc = &__get_cpu_var(cpu_hw_events);
 
+	/* chipsets have their own quirks when to unmask */
+	apic_write(APIC_LVTPC, APIC_DM_NMI);
+
 	intel_pmu_disable_all();
 	handled = intel_pmu_drain_bts_buffer();
 	status = intel_pmu_get_status();
diff --git a/arch/x86/kernel/cpu/perf_event_p4.c b/arch/x86/kernel/cpu/perf_event_p4.c
index c2520e1..20e37ed 100644
--- a/arch/x86/kernel/cpu/perf_event_p4.c
+++ b/arch/x86/kernel/cpu/perf_event_p4.c
@@ -950,11 +950,17 @@ static int p4_pmu_handle_irq(struct pt_regs *regs)
 			p4_pmu_disable_event(event);
 	}
 
-	if (handled) {
-		/* p4 quirk: unmask it again */
-		apic_write(APIC_LVTPC, apic_read(APIC_LVTPC) & ~APIC_LVT_MASKED);
+	if (handled)
 		inc_irq_stat(apic_perf_irqs);
-	}
+
+        /*
+	 * P4 quirks:
+	 * - An overflown perfctr will assert its interrupt
+	 *   until the OVF flag in its CCCR is cleared.
+	 * - LVTPC is masked on interrupt and must be
+	 *   unmasked by the LVTPC handler.
+	 */
+	apic_write(APIC_LVTPC, APIC_DM_NMI);
 
 	return handled;
 }
-- 
1.7.4.2


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] perf, nmi: Move LVT un-masking into irq handlers
  2011-04-18 16:56 [PATCH] perf, " Don Zickus
@ 2011-04-20 14:37 ` Cyrill Gorcunov
  2011-04-20 14:57   ` Don Zickus
  0 siblings, 1 reply; 7+ messages in thread
From: Cyrill Gorcunov @ 2011-04-20 14:37 UTC (permalink / raw)
  To: Don Zickus; +Cc: x86, Peter Zijlstra, Robert Richter, LKML, Ingo Molnar

On 04/18/2011 08:56 PM, Don Zickus wrote:
> It was noticed that P4 machines were generating double NMIs for each
> perf event.  These extra NMIs lead to 'Dazed and confused' messages on
> the screen.
> 
> I tracked this down to a P4 quirk that said the overflow bit had to be
> cleared before re-enabling the apic LVT mask.  My first attempt was
> to move the un-masking inside the perf nmi handler from before the
> chipset NMI handler to after.
> 
> This broke Nehalem boxes that seem to like the unmasking before the
> counters themselves are re-enabled.
> 
> In order to keep this change simple for 2.6.39, I decided to just
> simply move the apic LVT un-masking to the beginning of all the
> chipset NMI handlers, with the exeption of Pentium4's to fix the
> double NMI issue.
> 
> Later on we can move the un-masking to later in the handlers to save
> a number of 'extra' NMIs on those particular chipsets.
> 
> I tested this change on a P4 machine, an AMD machine, a Nehalem box,
> and a core2quad box.  'perf top' worked correctly along with various
> other small 'perf record' runs.  Anything high stress breaks all the
> machines but that is a different problem.
> 
> Thanks to various people for testing different versions of this patch.
> 
> Reported-and-tested-by: Shaun Ruffell <sruffell@digium.com>
> Signed-off-by: Don Zickus <dzickus@redhat.com>
> CC: Cyrill Gorcunov <gorcunov@gmail.com>
> ---
>  arch/x86/kernel/cpu/perf_event.c       |    5 +++--
>  arch/x86/kernel/cpu/perf_event_intel.c |    3 +++
>  arch/x86/kernel/cpu/perf_event_p4.c    |   14 ++++++++++----
>  3 files changed, 16 insertions(+), 6 deletions(-)
> 

Looks good to me (in terms of P4 PMU). So as an urgent solution

Acked-by: Cyrill Gorcunov <gorcunov@gmail.com>

For more deeper tests I've a slightly modified version which I've tested on Nehalem
and P4 but more testing is a way appreciated if someone interested in.

Don, sorry for delay, I somehow missed this patch at first place :/ And probably
the patch below could help with "stess" problem?

Ingo, please fetch Don's patch (not mine which I propose for testing instead of Don's patch) since
I believe it's important issue. Any tune ups could be done on top.

--
perf, x86: P4 PMU -- Fix race condition with LVTPC masking v3

Because perf_event_nmi_handler was unmasking LVTPC at the very
beginning of NMI handler routine P4 PMU observed "Uhhuh. NMI
received for unknown reason".

The description of such effect is rather the following: PMI issued,
it delivers APIC, APIC get masked and notifies cpu with NMI vector,
NMI handler starts and unmask APIC again, opening a window for a new
PMI delivery (but in real it's same PMI just signalled twice). So in
general such situation pretty well managed by our NMI in-flight catcher
but sometime it was found that such signal is delayed and delivered
at moment when counter get active again so kernel can't figure out
who was an initiator of NMI.

Moving APIC unmasking after x86_pmu.handle_irq lead to another problem,
at least on Nehalem it was found that second PMI signal may arrive APIC
too early, mask it and never propagate up to cpu as nmi signal. As result
apic entry get masked and no new PMI signal accepted.

Finally the conclusion was to make APIC unmaksing as a part of PMU
specific path, every PMU driver has its own unmasking call.

https://bugzilla.kernel.org/show_bug.cgi?id=33252

Reported-by: Shaun Ruffell <sruffell@digium.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
CC: Don Zickus <dzickus@redhat.com>
---
 arch/x86/kernel/cpu/perf_event.c       |    4 ++--
 arch/x86/kernel/cpu/perf_event_intel.c |   15 +++++++++++----
 arch/x86/kernel/cpu/perf_event_p4.c    |    9 +++++++--
 3 files changed, 20 insertions(+), 8 deletions(-)

Index: linux-2.6.git/arch/x86/kernel/cpu/perf_event.c
=====================================================================
--- linux-2.6.git.orig/arch/x86/kernel/cpu/perf_event.c
+++ linux-2.6.git/arch/x86/kernel/cpu/perf_event.c
@@ -1325,6 +1325,8 @@ static int x86_pmu_handle_irq(struct pt_
 	if (handled)
 		inc_irq_stat(apic_perf_irqs);

+	apic_write(APIC_LVTPC, APIC_DM_NMI);
+
 	return handled;
 }

@@ -1377,8 +1379,6 @@ perf_event_nmi_handler(struct notifier_b
 		return NOTIFY_DONE;
 	}

-	apic_write(APIC_LVTPC, APIC_DM_NMI);
-
 	handled = x86_pmu.handle_irq(args->regs);
 	if (!handled)
 		return NOTIFY_DONE;
Index: linux-2.6.git/arch/x86/kernel/cpu/perf_event_intel.c
=====================================================================
--- linux-2.6.git.orig/arch/x86/kernel/cpu/perf_event_intel.c
+++ linux-2.6.git/arch/x86/kernel/cpu/perf_event_intel.c
@@ -936,10 +936,8 @@ static int intel_pmu_handle_irq(struct p
 	intel_pmu_disable_all();
 	handled = intel_pmu_drain_bts_buffer();
 	status = intel_pmu_get_status();
-	if (!status) {
-		intel_pmu_enable_all(0);
-		return handled;
-	}
+	if (!status)
+		goto done;

 	loops = 0;
 again:
@@ -988,6 +986,15 @@ again:
 		goto again;

 done:
+	/*
+	 * We have to unmask LVTPC *before* enabling counters,
+	 * the key moment is that LVTPC is getting masked on
+	 * PMI arrival and if PMU enabled before APIC unmasked
+	 * a new oveflow signal may reach it but not propagated
+	 * as NMI (due to transition timings) and LVTPC eventually
+	 * stick masked forever dropping any further PMI signals.
+	 */
+	apic_write(APIC_LVTPC, APIC_DM_NMI);
 	intel_pmu_enable_all(0);
 	return handled;
 }
Index: linux-2.6.git/arch/x86/kernel/cpu/perf_event_p4.c
=====================================================================
--- linux-2.6.git.orig/arch/x86/kernel/cpu/perf_event_p4.c
+++ linux-2.6.git/arch/x86/kernel/cpu/perf_event_p4.c
@@ -950,8 +950,13 @@ static int p4_pmu_handle_irq(struct pt_r
 	}

 	if (handled) {
-		/* p4 quirk: unmask it again */
-		apic_write(APIC_LVTPC, apic_read(APIC_LVTPC) & ~APIC_LVT_MASKED);
+		/*
+		 * P4 quirk: it needs two writes otherwise
+		 * may stay masked (as LVTPC is gettting
+		 * masked on irq arrival).
+		 */
+		apic_write(APIC_LVTPC, APIC_DM_NMI);
+		apic_write(APIC_LVTPC, APIC_DM_NMI);
 		inc_irq_stat(apic_perf_irqs);
 	}


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] perf, nmi: Move LVT un-masking into irq handlers
  2011-04-20 14:37 ` Cyrill Gorcunov
@ 2011-04-20 14:57   ` Don Zickus
  2011-04-20 15:03     ` Cyrill Gorcunov
  0 siblings, 1 reply; 7+ messages in thread
From: Don Zickus @ 2011-04-20 14:57 UTC (permalink / raw)
  To: Cyrill Gorcunov; +Cc: x86, Peter Zijlstra, Robert Richter, LKML, Ingo Molnar

On Wed, Apr 20, 2011 at 06:37:10PM +0400, Cyrill Gorcunov wrote:
> Don, sorry for delay, I somehow missed this patch at first place :/ And probably
> the patch below could help with "stess" problem?

The patch I am using for the 'stress' problem is different than what I
posted because I wanted to keep it simple for 2.6.39.  But yeah I already
moved the unmasking to the enable_pmus_all(0) line.

The thing is that cuts down on some of these in-flight NMIs but not all of
them.  I came up with some logic that catches all the in-flight NMIs on
Nehalem and AMDs but my core2quad still fails.  I'll try to post that
today while requesting help.

Still not sure you need the double APIC writes (as your patch) shows.  I
haven't seen a case in my testing where a single APIC write fails.

Cheers,
Don

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] perf, nmi: Move LVT un-masking into irq handlers
  2011-04-20 14:57   ` Don Zickus
@ 2011-04-20 15:03     ` Cyrill Gorcunov
  0 siblings, 0 replies; 7+ messages in thread
From: Cyrill Gorcunov @ 2011-04-20 15:03 UTC (permalink / raw)
  To: Don Zickus; +Cc: x86, Peter Zijlstra, Robert Richter, LKML, Ingo Molnar

On 04/20/2011 06:57 PM, Don Zickus wrote:
> On Wed, Apr 20, 2011 at 06:37:10PM +0400, Cyrill Gorcunov wrote:
>> Don, sorry for delay, I somehow missed this patch at first place :/ And probably
>> the patch below could help with "stess" problem?
> 
> The patch I am using for the 'stress' problem is different than what I
> posted because I wanted to keep it simple for 2.6.39.  But yeah I already
> moved the unmasking to the enable_pmus_all(0) line.

ok

> 
> The thing is that cuts down on some of these in-flight NMIs but not all of
> them.  I came up with some logic that catches all the in-flight NMIs on
> Nehalem and AMDs but my core2quad still fails.  I'll try to post that
> today while requesting help.

ok

> 
> Still not sure you need the double APIC writes (as your patch) shows.  I
> haven't seen a case in my testing where a single APIC write fails.

yup, we can drop it, thanks!

> 
> Cheers,
> Don

-- 
    Cyrill

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH] perf, nmi: Move LVT un-masking into irq handlers
@ 2011-04-27 10:32 Don Zickus
  2011-04-27 17:26 ` Cyrill Gorcunov
  2011-04-27 18:33 ` [tip:perf/urgent] perf, x86, " tip-bot for Don Zickus
  0 siblings, 2 replies; 7+ messages in thread
From: Don Zickus @ 2011-04-27 10:32 UTC (permalink / raw)
  To: x86; +Cc: LKML, Don Zickus, Cyrill Gorcunov

It was noticed that P4 machines were generating double NMIs for each
perf event.  These extra NMIs lead to 'Dazed and confused' messages on
the screen.

I tracked this down to a P4 quirk that said the overflow bit had to be
cleared before re-enabling the apic LVT mask.  My first attempt was
to move the un-masking inside the perf nmi handler from before the
chipset NMI handler to after.

This broke Nehalem boxes that seem to like the unmasking before the
counters themselves are re-enabled.

In order to keep this change simple for 2.6.39, I decided to just
simply move the apic LVT un-masking to the beginning of all the
chipset NMI handlers, with the exception of Pentium4's to fix the
double NMI issue.

Later on we can move the un-masking to later in the handlers to save
a number of 'extra' NMIs on those particular chipsets.

I tested this change on a P4 machine, an AMD machine, a Nehalem box,
and a core2quad box.  'perf top' worked correctly along with various
other small 'perf record' runs.  Anything high stress breaks all the
machines but that is a different problem.

Thanks to various people for testing different versions of this patch.

Reported-and-tested-by: Shaun Ruffell <sruffell@digium.com>
Signed-off-by: Don Zickus <dzickus@redhat.com>
CC: Cyrill Gorcunov <gorcunov@gmail.com>

---
Originally I was going to bundle this with Cyrill's P4 alias patch but that
may take awhile, so I am posting this stand alone for now.  Sorry for the
delay.

---
 arch/x86/kernel/cpu/perf_event.c       |   12 ++++++++++--
 arch/x86/kernel/cpu/perf_event_intel.c |   10 ++++++++++
 arch/x86/kernel/cpu/perf_event_p4.c    |   17 +++++++++++++----
 3 files changed, 33 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
index eed3673a..9d1130d 100644
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -1284,6 +1284,16 @@ static int x86_pmu_handle_irq(struct pt_regs *regs)
 
 	cpuc = &__get_cpu_var(cpu_hw_events);
 
+	/*
+	 * Some chipsets need to unmask the LVTPC in a particular spot
+	 * inside the nmi handler.  As a result, the unmasking was pushed
+	 * into all the nmi handlers.
+	 *
+	 * This generic handler doesn't seem to have any issues where the
+	 * unmasking occurs so it was left at the top.
+	 */
+	apic_write(APIC_LVTPC, APIC_DM_NMI);
+
 	for (idx = 0; idx < x86_pmu.num_counters; idx++) {
 		if (!test_bit(idx, cpuc->active_mask)) {
 			/*
@@ -1370,8 +1380,6 @@ perf_event_nmi_handler(struct notifier_block *self,
 		return NOTIFY_DONE;
 	}
 
-	apic_write(APIC_LVTPC, APIC_DM_NMI);
-
 	handled = x86_pmu.handle_irq(args->regs);
 	if (!handled)
 		return NOTIFY_DONE;
diff --git a/arch/x86/kernel/cpu/perf_event_intel.c b/arch/x86/kernel/cpu/perf_event_intel.c
index 8fc2b2c..9607953 100644
--- a/arch/x86/kernel/cpu/perf_event_intel.c
+++ b/arch/x86/kernel/cpu/perf_event_intel.c
@@ -933,6 +933,16 @@ static int intel_pmu_handle_irq(struct pt_regs *regs)
 
 	cpuc = &__get_cpu_var(cpu_hw_events);
 
+	/*
+	 * Some chipsets need to unmask the LVTPC in a particular spot
+	 * inside the nmi handler.  As a result, the unmasking was pushed
+	 * into all the nmi handlers.
+	 *
+	 * This handler doesn't seem to have any issues with the unmasking
+	 * so it was left at the top.
+	 */
+	apic_write(APIC_LVTPC, APIC_DM_NMI);
+
 	intel_pmu_disable_all();
 	handled = intel_pmu_drain_bts_buffer();
 	status = intel_pmu_get_status();
diff --git a/arch/x86/kernel/cpu/perf_event_p4.c b/arch/x86/kernel/cpu/perf_event_p4.c
index c2520e1..60da195 100644
--- a/arch/x86/kernel/cpu/perf_event_p4.c
+++ b/arch/x86/kernel/cpu/perf_event_p4.c
@@ -950,11 +950,20 @@ static int p4_pmu_handle_irq(struct pt_regs *regs)
 			p4_pmu_disable_event(event);
 	}
 
-	if (handled) {
-		/* p4 quirk: unmask it again */
-		apic_write(APIC_LVTPC, apic_read(APIC_LVTPC) & ~APIC_LVT_MASKED);
+	if (handled)
 		inc_irq_stat(apic_perf_irqs);
-	}
+
+	/*
+	 * When dealing with the unmasking of the LVTPC on P4 perf hw, it has
+	 * been observed that the OVF bit flag has to be cleared first _before_
+	 * the LVTPC can be unmasked.
+	 *
+	 * The reason is the NMI line will continue to be asserted while the OVF
+	 * bit is set.  This causes a second NMI to generate if the LVTPC is
+	 * unmasked before the OVF bit is cleared, leading to unknown NMI
+	 * messages.
+	 */
+	apic_write(APIC_LVTPC, APIC_DM_NMI);
 
 	return handled;
 }
-- 
1.7.4.2


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] perf, nmi: Move LVT un-masking into irq handlers
  2011-04-27 10:32 [PATCH] perf, nmi: Move LVT un-masking into irq handlers Don Zickus
@ 2011-04-27 17:26 ` Cyrill Gorcunov
  2011-04-27 18:33 ` [tip:perf/urgent] perf, x86, " tip-bot for Don Zickus
  1 sibling, 0 replies; 7+ messages in thread
From: Cyrill Gorcunov @ 2011-04-27 17:26 UTC (permalink / raw)
  To: Don Zickus; +Cc: x86, LKML, Ingo Molnar

On 04/27/2011 02:32 PM, Don Zickus wrote:
> It was noticed that P4 machines were generating double NMIs for each
> perf event.  These extra NMIs lead to 'Dazed and confused' messages on
> the screen.
> 
> I tracked this down to a P4 quirk that said the overflow bit had to be
> cleared before re-enabling the apic LVT mask.  My first attempt was
> to move the un-masking inside the perf nmi handler from before the
> chipset NMI handler to after.
> 
> This broke Nehalem boxes that seem to like the unmasking before the
> counters themselves are re-enabled.
> 
> In order to keep this change simple for 2.6.39, I decided to just
> simply move the apic LVT un-masking to the beginning of all the
> chipset NMI handlers, with the exception of Pentium4's to fix the
> double NMI issue.
> 
> Later on we can move the un-masking to later in the handlers to save
> a number of 'extra' NMIs on those particular chipsets.
> 
> I tested this change on a P4 machine, an AMD machine, a Nehalem box,
> and a core2quad box.  'perf top' worked correctly along with various
> other small 'perf record' runs.  Anything high stress breaks all the
> machines but that is a different problem.
> 
> Thanks to various people for testing different versions of this patch.
> 
> Reported-and-tested-by: Shaun Ruffell <sruffell@digium.com>
> Signed-off-by: Don Zickus <dzickus@redhat.com>
> CC: Cyrill Gorcunov <gorcunov@gmail.com>
> 
> ---
> Originally I was going to bundle this with Cyrill's P4 alias patch but that
> may take awhile, so I am posting this stand alone for now.  Sorry for the
> delay.
> 

Looks good to me, thanks a lot Don! I'll send you updated alternate events
patch.

-- 
    Cyrill

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [tip:perf/urgent] perf, x86, nmi: Move LVT un-masking into irq handlers
  2011-04-27 10:32 [PATCH] perf, nmi: Move LVT un-masking into irq handlers Don Zickus
  2011-04-27 17:26 ` Cyrill Gorcunov
@ 2011-04-27 18:33 ` tip-bot for Don Zickus
  1 sibling, 0 replies; 7+ messages in thread
From: tip-bot for Don Zickus @ 2011-04-27 18:33 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hpa, mingo, gorcunov, sruffell, tglx, dzickus,
	mingo

Commit-ID:  2bce5daca28346f19c190dbdb5542c9fe3e8c6e6
Gitweb:     http://git.kernel.org/tip/2bce5daca28346f19c190dbdb5542c9fe3e8c6e6
Author:     Don Zickus <dzickus@redhat.com>
AuthorDate: Wed, 27 Apr 2011 06:32:33 -0400
Committer:  Ingo Molnar <mingo@elte.hu>
CommitDate: Wed, 27 Apr 2011 17:59:11 +0200

perf, x86, nmi: Move LVT un-masking into irq handlers

It was noticed that P4 machines were generating double NMIs for
each perf event.  These extra NMIs lead to 'Dazed and confused'
messages on the screen.

I tracked this down to a P4 quirk that said the overflow bit had
to be cleared before re-enabling the apic LVT mask.  My first
attempt was to move the un-masking inside the perf nmi handler
from before the chipset NMI handler to after.

This broke Nehalem boxes that seem to like the unmasking before
the counters themselves are re-enabled.

In order to keep this change simple for 2.6.39, I decided to
just simply move the apic LVT un-masking to the beginning of all
the chipset NMI handlers, with the exception of Pentium4's to
fix the double NMI issue.

Later on we can move the un-masking to later in the handlers to
save a number of 'extra' NMIs on those particular chipsets.

I tested this change on a P4 machine, an AMD machine, a Nehalem
box, and a core2quad box.  'perf top' worked correctly along
with various other small 'perf record' runs.  Anything high
stress breaks all the machines but that is a different problem.

Thanks to various people for testing different versions of this
patch.

Reported-and-tested-by: Shaun Ruffell <sruffell@digium.com>
Signed-off-by: Don Zickus <dzickus@redhat.com>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Link: http://lkml.kernel.org/r/1303900353-10242-1-git-send-email-dzickus@redhat.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
CC: Cyrill Gorcunov <gorcunov@gmail.com>
---
 arch/x86/kernel/cpu/perf_event.c       |   12 ++++++++++--
 arch/x86/kernel/cpu/perf_event_intel.c |   10 ++++++++++
 arch/x86/kernel/cpu/perf_event_p4.c    |   17 +++++++++++++----
 3 files changed, 33 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
index fac0654..e638689 100644
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -1288,6 +1288,16 @@ static int x86_pmu_handle_irq(struct pt_regs *regs)
 
 	cpuc = &__get_cpu_var(cpu_hw_events);
 
+	/*
+	 * Some chipsets need to unmask the LVTPC in a particular spot
+	 * inside the nmi handler.  As a result, the unmasking was pushed
+	 * into all the nmi handlers.
+	 *
+	 * This generic handler doesn't seem to have any issues where the
+	 * unmasking occurs so it was left at the top.
+	 */
+	apic_write(APIC_LVTPC, APIC_DM_NMI);
+
 	for (idx = 0; idx < x86_pmu.num_counters; idx++) {
 		if (!test_bit(idx, cpuc->active_mask)) {
 			/*
@@ -1374,8 +1384,6 @@ perf_event_nmi_handler(struct notifier_block *self,
 		return NOTIFY_DONE;
 	}
 
-	apic_write(APIC_LVTPC, APIC_DM_NMI);
-
 	handled = x86_pmu.handle_irq(args->regs);
 	if (!handled)
 		return NOTIFY_DONE;
diff --git a/arch/x86/kernel/cpu/perf_event_intel.c b/arch/x86/kernel/cpu/perf_event_intel.c
index 9ae4a2a..e61539b 100644
--- a/arch/x86/kernel/cpu/perf_event_intel.c
+++ b/arch/x86/kernel/cpu/perf_event_intel.c
@@ -933,6 +933,16 @@ static int intel_pmu_handle_irq(struct pt_regs *regs)
 
 	cpuc = &__get_cpu_var(cpu_hw_events);
 
+	/*
+	 * Some chipsets need to unmask the LVTPC in a particular spot
+	 * inside the nmi handler.  As a result, the unmasking was pushed
+	 * into all the nmi handlers.
+	 *
+	 * This handler doesn't seem to have any issues with the unmasking
+	 * so it was left at the top.
+	 */
+	apic_write(APIC_LVTPC, APIC_DM_NMI);
+
 	intel_pmu_disable_all();
 	handled = intel_pmu_drain_bts_buffer();
 	status = intel_pmu_get_status();
diff --git a/arch/x86/kernel/cpu/perf_event_p4.c b/arch/x86/kernel/cpu/perf_event_p4.c
index d1f77e2..e93fcd5 100644
--- a/arch/x86/kernel/cpu/perf_event_p4.c
+++ b/arch/x86/kernel/cpu/perf_event_p4.c
@@ -950,11 +950,20 @@ static int p4_pmu_handle_irq(struct pt_regs *regs)
 			x86_pmu_stop(event, 0);
 	}
 
-	if (handled) {
-		/* p4 quirk: unmask it again */
-		apic_write(APIC_LVTPC, apic_read(APIC_LVTPC) & ~APIC_LVT_MASKED);
+	if (handled)
 		inc_irq_stat(apic_perf_irqs);
-	}
+
+	/*
+	 * When dealing with the unmasking of the LVTPC on P4 perf hw, it has
+	 * been observed that the OVF bit flag has to be cleared first _before_
+	 * the LVTPC can be unmasked.
+	 *
+	 * The reason is the NMI line will continue to be asserted while the OVF
+	 * bit is set.  This causes a second NMI to generate if the LVTPC is
+	 * unmasked before the OVF bit is cleared, leading to unknown NMI
+	 * messages.
+	 */
+	apic_write(APIC_LVTPC, APIC_DM_NMI);
 
 	return handled;
 }

^ permalink raw reply related	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2011-04-27 18:34 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-04-27 10:32 [PATCH] perf, nmi: Move LVT un-masking into irq handlers Don Zickus
2011-04-27 17:26 ` Cyrill Gorcunov
2011-04-27 18:33 ` [tip:perf/urgent] perf, x86, " tip-bot for Don Zickus
  -- strict thread matches above, loose matches on Subject: below --
2011-04-18 16:56 [PATCH] perf, " Don Zickus
2011-04-20 14:37 ` Cyrill Gorcunov
2011-04-20 14:57   ` Don Zickus
2011-04-20 15:03     ` Cyrill Gorcunov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox