linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3 v2] nmi perf fixes
@ 2010-09-02 19:07 Don Zickus
  2010-09-02 19:07 ` [PATCH 1/3] perf, x86: Fix accidentally ack'ing a second event on intel perf counter Don Zickus
                   ` (3 more replies)
  0 siblings, 4 replies; 118+ messages in thread
From: Don Zickus @ 2010-09-02 19:07 UTC (permalink / raw)
  To: mingo
  Cc: peterz, robert.richter, gorcunov, fweisbec, linux-kernel,
	ying.huang, ming.m.lin, yinghai, andi, eranian, Don Zickus

Fixes to allow unknown nmis to pass through the perf nmi handler instead
of being swallowed.  Contains patches that are already in Ingo's tree.  Added
here for completeness.  Based on ingo/tip

Tested on intel/amd

v2: patch cleanups and consolidation, no code changes

Don Zickus (1):
  perf, x86: Fix accidentally ack'ing a second event on intel perf
    counter

Peter Zijlstra (1):
  perf, x86: Fix handle_irq return values

Robert Richter (1):
  perf, x86: Try to handle unknown nmis with an enabled PMU

 arch/x86/kernel/cpu/perf_event.c       |   59 +++++++++++++++++++++++++-------
 arch/x86/kernel/cpu/perf_event_intel.c |   15 +++++---
 arch/x86/kernel/cpu/perf_event_p4.c    |    2 +-
 3 files changed, 56 insertions(+), 20 deletions(-)

-- 
1.7.2.2


^ permalink raw reply	[flat|nested] 118+ messages in thread
* [PATCH -v3] perf, x86: try to handle unknown nmis with running perfctrs
@ 2010-08-17 15:22 Robert Richter
  2010-08-20 14:17 ` [tip:perf/urgent] perf, x86: Try to handle unknown nmis with an enabled PMU tip-bot for Robert Richter
  0 siblings, 1 reply; 118+ messages in thread
From: Robert Richter @ 2010-08-17 15:22 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Don Zickus, Cyrill Gorcunov, Lin Ming, Ingo Molnar,
	fweisbec@gmail.com, linux-kernel@vger.kernel.org, Huang, Ying,
	Yinghai Lu, Andi Kleen

Peter,

this is version 3 without timestamp code. Compared to -v1 it
implements the handling for 2-1-0 nmi sequences.

-Robert

>From 71a206394d8a3536b033247ec14ad037fef77236 Mon Sep 17 00:00:00 2001
From: Robert Richter <robert.richter@amd.com>
Date: Tue, 17 Aug 2010 16:42:03 +0200
Subject: [PATCH] perf, x86: try to handle unknown nmis with running perfctrs

When perfctrs are running it is valid to have unhandled nmis, two
events could trigger 'simultaneously' raising two back-to-back
NMIs. If the first NMI handles both, the latter will be empty and daze
the CPU.

The solution to avoid an 'unknown nmi' massage in this case was simply
to stop the nmi handler chain when perfctrs are runnning by stating
the nmi was handled. This has the drawback that a) we can not detect
unknown nmis anymore, and b) subsequent nmi handlers are not called.

This patch addresses this. Now, we check this unknown NMI if it could
be a perfctr back-to-back NMI. Otherwise we pass it and let the kernel
handle the unknown nmi.

This is a debug log:

cpu #6, nmi #32333, skip_nmi #32330, handled = 1, time = 1934364430
cpu #6, nmi #32334, skip_nmi #32330, handled = 1, time = 1934704616
cpu #6, nmi #32335, skip_nmi #32336, handled = 2, time = 1936032320
cpu #6, nmi #32336, skip_nmi #32336, handled = 0, time = 1936034139
cpu #6, nmi #32337, skip_nmi #32336, handled = 1, time = 1936120100
cpu #6, nmi #32338, skip_nmi #32336, handled = 1, time = 1936404607
cpu #6, nmi #32339, skip_nmi #32336, handled = 1, time = 1937983416
cpu #6, nmi #32340, skip_nmi #32341, handled = 2, time = 1938201032
cpu #6, nmi #32341, skip_nmi #32341, handled = 0, time = 1938202830
cpu #6, nmi #32342, skip_nmi #32341, handled = 1, time = 1938443743
cpu #6, nmi #32343, skip_nmi #32341, handled = 1, time = 1939956552
cpu #6, nmi #32344, skip_nmi #32341, handled = 1, time = 1940073224
cpu #6, nmi #32345, skip_nmi #32341, handled = 1, time = 1940485677
cpu #6, nmi #32346, skip_nmi #32347, handled = 2, time = 1941947772
cpu #6, nmi #32347, skip_nmi #32347, handled = 1, time = 1941949818
cpu #6, nmi #32348, skip_nmi #32347, handled = 0, time = 1941951591
Uhhuh. NMI received for unknown reason 00 on CPU 6.
Do you have a strange power saving mode enabled?
Dazed and confused, but trying to continue

Deltas:

nmi #32334 340186
nmi #32335 1327704
nmi #32336 1819      <<<< back-to-back nmi [1]
nmi #32337 85961
nmi #32338 284507
nmi #32339 1578809
nmi #32340 217616
nmi #32341 1798      <<<< back-to-back nmi [2]
nmi #32342 240913
nmi #32343 1512809
nmi #32344 116672
nmi #32345 412453
nmi #32346 1462095   <<<< 1st nmi (standard) handling 2 counters
nmi #32347 2046      <<<< 2nd nmi (back-to-back) handling one counter
nmi #32348 1773      <<<< 3rd nmi (back-to-back) handling no counter! [3]

For  back-to-back nmi detection there are the following rules:

The perfctr nmi handler was handling more than one counter and no
counter was handled in the subsequent nmi (see [1] and [2] above).

There is another case if there are two subsequent back-to-back nmis
[3]. The 2nd is detected as back-to-back because the first handled
more than one counter. If the second handles one counter and the 3rd
handles nothing, we drop the 3rd nmi because it could be a
back-to-back nmi.

Signed-off-by: Robert Richter <robert.richter@amd.com>
---
 arch/x86/kernel/cpu/perf_event.c |   65 ++++++++++++++++++++++++++++++-------
 1 files changed, 52 insertions(+), 13 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
index f2da20f..bbcc89e 100644
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -1154,7 +1154,7 @@ static int x86_pmu_handle_irq(struct pt_regs *regs)
 		/*
 		 * event overflow
 		 */
-		handled		= 1;
+		handled++;
 		data.period	= event->hw.last_period;
 
 		if (!x86_perf_event_set_period(event))
@@ -1200,12 +1200,20 @@ void perf_events_lapic_init(void)
 	apic_write(APIC_LVTPC, APIC_DM_NMI);
 }
 
+struct perfctr_nmi {
+	unsigned int	marked;
+	int		handled;
+};
+
+static DEFINE_PER_CPU(struct perfctr_nmi, nmi);
+
 static int __kprobes
 perf_event_nmi_handler(struct notifier_block *self,
 			 unsigned long cmd, void *__args)
 {
 	struct die_args *args = __args;
-	struct pt_regs *regs;
+	unsigned int this_nmi;
+	int handled;
 
 	if (!atomic_read(&active_events))
 		return NOTIFY_DONE;
@@ -1214,22 +1222,53 @@ perf_event_nmi_handler(struct notifier_block *self,
 	case DIE_NMI:
 	case DIE_NMI_IPI:
 		break;
-
+	case DIE_NMIUNKNOWN:
+		this_nmi = percpu_read(irq_stat.__nmi_count);
+		if (this_nmi != __get_cpu_var(nmi).marked)
+			/* let the kernel handle the unknown nmi */
+			return NOTIFY_DONE;
+		/*
+		 * This one is a perfctr back-to-back nmi. Two events
+		 * trigger 'simultaneously' raising two back-to-back
+		 * NMIs. If the first NMI handles both, the latter
+		 * will be empty and daze the CPU. So, we drop it to
+		 * avoid false-positive 'unknown nmi' messages.
+		 */
+		return NOTIFY_STOP;
 	default:
 		return NOTIFY_DONE;
 	}
 
-	regs = args->regs;
-
 	apic_write(APIC_LVTPC, APIC_DM_NMI);
-	/*
-	 * Can't rely on the handled return value to say it was our NMI, two
-	 * events could trigger 'simultaneously' raising two back-to-back NMIs.
-	 *
-	 * If the first NMI handles both, the latter will be empty and daze
-	 * the CPU.
-	 */
-	x86_pmu.handle_irq(regs);
+
+	handled = x86_pmu.handle_irq(args->regs);
+	if (!handled)
+		return NOTIFY_DONE;
+
+	this_nmi = percpu_read(irq_stat.__nmi_count);
+	if (handled > 1)
+		goto mark_nmi;
+	if ((__get_cpu_var(nmi).marked == this_nmi)
+	    && (__get_cpu_var(nmi).handled > 1))
+		/*
+		 * We could have two subsequent back-to-back nmis: The
+		 * first handles more than one counter, the 2nd
+		 * handles only one counter and the 3rd handles no
+		 * counter.
+		 *
+		 * This is the 2nd nmi because the previous was
+		 * handling more than one counter. We will mark the
+		 * next (3rd) and then drop it if unhandled.
+		 */
+		goto mark_nmi;
+
+	/* this may not trigger back-to-back nmis */
+	return NOTIFY_STOP;
+
+ mark_nmi:
+	/* the next nmi could be a back-to-back nmi */
+	__get_cpu_var(nmi).marked	= this_nmi + 1;
+	__get_cpu_var(nmi).handled	= handled;
 
 	return NOTIFY_STOP;
 }
-- 
1.7.1.1

-- 
Advanced Micro Devices, Inc.
Operating System Research Center


^ permalink raw reply related	[flat|nested] 118+ messages in thread

end of thread, other threads:[~2010-10-11 20:25 UTC | newest]

Thread overview: 118+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-09-02 19:07 [PATCH 0/3 v2] nmi perf fixes Don Zickus
2010-09-02 19:07 ` [PATCH 1/3] perf, x86: Fix accidentally ack'ing a second event on intel perf counter Don Zickus
2010-09-02 19:26   ` Cyrill Gorcunov
2010-09-02 20:00     ` Don Zickus
2010-09-02 20:36       ` Cyrill Gorcunov
2010-09-03  7:10   ` [tip:perf/urgent] " tip-bot for Don Zickus
2010-09-03  7:39     ` Yinghai Lu
2010-09-03 15:00       ` Don Zickus
2010-09-03 17:15         ` Yinghai Lu
2010-09-03 18:35           ` Don Zickus
2010-09-03 19:24             ` Yinghai Lu
2010-09-03 20:10               ` Don Zickus
2010-10-04 23:24             ` Yinghai Lu
2010-10-11 20:25               ` Don Zickus
2010-09-02 19:07 ` [PATCH 2/3] perf, x86: Try to handle unknown nmis with an enabled PMU Don Zickus
2010-09-03  7:11   ` [tip:perf/urgent] " tip-bot for Robert Richter
2010-09-02 19:07 ` [PATCH 3/3] perf, x86: Fix handle_irq return values Don Zickus
2010-09-03  7:10   ` [tip:perf/urgent] " tip-bot for Peter Zijlstra
2010-09-10 11:41 ` [PATCH 0/3 v2] nmi perf fixes Peter Zijlstra
2010-09-10 12:10   ` Stephane Eranian
2010-09-10 12:13     ` Stephane Eranian
2010-09-10 13:27   ` Don Zickus
2010-09-10 14:46     ` Ingo Molnar
2010-09-10 15:17       ` Robert Richter
2010-09-10 15:58         ` Peter Zijlstra
2010-09-10 16:41           ` Ingo Molnar
2010-09-10 16:42             ` Ingo Molnar
2010-09-10 16:37         ` Ingo Molnar
2010-09-10 16:51           ` Ingo Molnar
2010-09-10 15:56       ` [PATCH] x86: fix duplicate calls of the nmi handler Robert Richter
2010-09-10 16:15         ` Peter Zijlstra
2010-09-11  9:41         ` Ingo Molnar
2010-09-11 11:44           ` Robert Richter
2010-09-11 12:45             ` Ingo Molnar
2010-09-12  9:52               ` Robert Richter
2010-09-13 14:37                 ` Robert Richter
2010-09-14 17:41                   ` Robert Richter
2010-09-15 16:20                     ` [PATCH] perf, x86: catch spurious interrupts after disabling counters Robert Richter
2010-09-15 16:36                       ` Stephane Eranian
2010-09-15 17:00                         ` Robert Richter
2010-09-15 17:32                           ` Stephane Eranian
2010-09-15 18:44                             ` Robert Richter
2010-09-15 19:34                               ` Cyrill Gorcunov
2010-09-15 20:21                                 ` Stephane Eranian
2010-09-15 20:39                                   ` Cyrill Gorcunov
2010-09-15 22:27                                     ` Robert Richter
2010-09-16 14:51                                       ` Frederic Weisbecker
2010-09-15 16:46                       ` Cyrill Gorcunov
2010-09-15 16:47                         ` Stephane Eranian
2010-09-15 17:02                           ` Cyrill Gorcunov
2010-09-15 17:28                             ` Robert Richter
2010-09-15 17:40                               ` Cyrill Gorcunov
2010-09-15 22:10                                 ` Robert Richter
2010-09-16  6:53                                   ` Cyrill Gorcunov
2010-09-16 17:34                       ` Peter Zijlstra
2010-09-17  8:51                         ` Robert Richter
2010-09-17  9:14                           ` Peter Zijlstra
2010-09-17 13:06                       ` Stephane Eranian
2010-09-20  8:41                         ` Robert Richter
2010-09-24  0:02                       ` Don Zickus
2010-09-24  3:18                         ` Don Zickus
2010-09-24 10:03                           ` Robert Richter
2010-09-24 13:38                             ` Stephane Eranian
2010-09-30 12:33                               ` Peter Zijlstra
2010-09-24 18:11                             ` Don Zickus
2010-09-24 10:41                       ` [tip:perf/urgent] perf, x86: Catch " tip-bot for Robert Richter
2010-09-29 12:26                         ` Stephane Eranian
2010-09-29 12:53                           ` Robert Richter
2010-09-29 12:54                             ` Robert Richter
2010-09-29 13:13                               ` Stephane Eranian
2010-09-29 13:28                                 ` Stephane Eranian
2010-09-29 15:01                                   ` Robert Richter
2010-09-29 15:12                                     ` Robert Richter
2010-09-29 15:27                                       ` Cyrill Gorcunov
2010-09-29 15:33                                         ` Stephane Eranian
2010-09-29 15:45                                           ` Cyrill Gorcunov
2010-09-29 15:51                                             ` Cyrill Gorcunov
2010-09-29 16:32                                               ` Robert Richter
2010-09-29 16:48                                                 ` Cyrill Gorcunov
2010-09-29 16:00                                             ` Stephane Eranian
2010-09-29 17:09                                               ` Robert Richter
2010-09-29 17:41                                                 ` Cyrill Gorcunov
2010-09-29 18:12                                                 ` Don Zickus
2010-09-29 19:42                                                   ` Stephane Eranian
2010-09-29 20:03                                                     ` Don Zickus
2010-09-30  9:12                                                     ` Robert Richter
2010-09-30 19:44                                                       ` Don Zickus
2010-10-01  7:17                                                         ` Robert Richter
     [not found]                                                           ` <AANLkTimUyLaVaBigjm0-CwRsdh4UXWDiss2ffX53S+k_@mail.gmail.com>
2010-10-01 11:53                                                             ` Stephane Eranian
2010-10-02  9:35                                                               ` Robert Richter
2010-10-04  8:53                                                                 ` Stephane Eranian
2010-10-04  9:07                                                                   ` Andi Kleen
2010-10-04 17:28                                                                     ` Stephane Eranian
2010-09-29 16:31                                           ` Robert Richter
2010-09-29 16:22                                         ` Robert Richter
2010-09-29 19:01                                         ` Don Zickus
2010-09-29 13:39                                 ` Robert Richter
2010-09-29 13:56                                   ` Stephane Eranian
2010-09-29 14:00                                     ` Stephane Eranian
2010-10-02  9:50                                       ` Robert Richter
2010-10-02 17:40                                         ` Stephane Eranian
2010-09-29 15:02                                     ` Cyrill Gorcunov
2010-09-16 17:42         ` [PATCH] x86: fix duplicate calls of the nmi handler Peter Zijlstra
2010-09-16 20:18           ` Stephane Eranian
2010-09-17  7:09             ` Peter Zijlstra
2010-09-17  0:13           ` Huang Ying
2010-09-17  7:52             ` Peter Zijlstra
2010-09-17  8:13               ` Robert Richter
2010-09-17  8:37                 ` Cyrill Gorcunov
2010-09-17  8:47               ` Huang Ying
2010-09-10 13:34   ` [PATCH 0/3 v2] nmi perf fixes Peter Zijlstra
2010-09-10 13:52     ` Peter Zijlstra
2010-09-13  8:55       ` Cyrill Gorcunov
2010-09-13  9:54         ` Stephane Eranian
2010-09-13 10:07           ` Cyrill Gorcunov
2010-09-13 10:10             ` Stephane Eranian
2010-09-13 10:12               ` Cyrill Gorcunov
  -- strict thread matches above, loose matches on Subject: below --
2010-08-17 15:22 [PATCH -v3] perf, x86: try to handle unknown nmis with running perfctrs Robert Richter
2010-08-20 14:17 ` [tip:perf/urgent] perf, x86: Try to handle unknown nmis with an enabled PMU tip-bot for Robert Richter

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).