From: Peter Zijlstra <peterz@infradead.org>
To: Don Zickus <dzickus@redhat.com>
Cc: mingo@elte.hu, robert.richter@amd.com, gorcunov@gmail.com,
fweisbec@gmail.com, linux-kernel@vger.kernel.org,
ying.huang@intel.com, ming.m.lin@intel.com, yinghai@kernel.org,
andi@firstfloor.org, eranian@google.com
Subject: Re: [PATCH 0/3 v2] nmi perf fixes
Date: Fri, 10 Sep 2010 15:34:53 +0200 [thread overview]
Message-ID: <1284125693.402.58.camel@laptop> (raw)
In-Reply-To: <1284118900.402.35.camel@laptop>
On Fri, 2010-09-10 at 13:41 +0200, Peter Zijlstra wrote:
> On Thu, 2010-09-02 at 15:07 -0400, Don Zickus wrote:
> > Fixes to allow unknown nmis to pass through the perf nmi handler instead
> > of being swallowed. Contains patches that are already in Ingo's tree. Added
> > here for completeness. Based on ingo/tip
> Both Ingo and I are getting Dazed and confused on our AMD machines, it
> started before yesterday (that is, after backing out all my recent
> changes it still gets dazed), so I suspect this set.
>
> I'll look at getting a trace of the thing, but if any of you has a
> bright idea...
<...>-2155 [000] 57.298895: perf_event_nmi_handler: NMI-handled(1): 5735 0 0
<...>-2155 [000] 57.298896: perf_event_nmi_handler: NMI-stop: 5735 0 0
<...>-2155 [000] 57.298898: perf_event_nmi_handler: NMI: 5736 0 0
<...>-2155 [000] 57.298898: x86_pmu_handle_irq: OVERFLOW: 0
<...>-2155 [000] 57.298899: x86_pmu_handle_irq: HANDLED: 1
<...>-2155 [000] 57.298901: x86_pmu_handle_irq: OVERFLOW: 2
<...>-2155 [000] 57.298901: x86_pmu_handle_irq: OVERFLOW: 3
<...>-2155 [000] 57.298902: perf_event_nmi_handler: NMI-handled(1): 5736 0 0
<...>-2155 [000] 57.298903: perf_event_nmi_handler: NMI-stop: 5736 0 0
<...>-2155 [000] 57.298905: perf_event_nmi_handler: NMI: 5737 0 0
<...>-2155 [000] 57.298905: x86_pmu_handle_irq: OVERFLOW: 0
<...>-2155 [000] 57.298906: x86_pmu_handle_irq: HANDLED: 1
<...>-2155 [000] 57.298908: x86_pmu_handle_irq: OVERFLOW: 2
<...>-2155 [000] 57.298908: x86_pmu_handle_irq: OVERFLOW: 3
<...>-2155 [000] 57.298909: perf_event_nmi_handler: NMI-handled(1): 5737 0 0
<...>-2155 [000] 57.298909: perf_event_nmi_handler: NMI-stop: 5737 0 0
<...>-2155 [000] 57.298911: perf_event_nmi_handler: NMI: 5738 0 0
<...>-2155 [000] 57.298912: x86_pmu_handle_irq: OVERFLOW: 0
<...>-2155 [000] 57.298913: x86_pmu_handle_irq: HANDLED: 1
<...>-2155 [000] 57.298915: x86_pmu_handle_irq: OVERFLOW: 2
<...>-2155 [000] 57.298916: x86_pmu_handle_irq: OVERFLOW: 3
<...>-2155 [000] 57.298916: perf_event_nmi_handler: NMI-handled(1): 5738 0 0
<...>-2155 [000] 57.298917: perf_event_nmi_handler: NMI-stop: 5738 0 0
<...>-2155 [000] 57.298919: perf_event_nmi_handler: NMI: 5739 0 0
<...>-2155 [000] 57.298920: x86_pmu_handle_irq: OVERFLOW: 0
<...>-2155 [000] 57.298921: x86_pmu_handle_irq: OVERFLOW: 2
<...>-2155 [000] 57.298921: x86_pmu_handle_irq: OVERFLOW: 3
<...>-2155 [000] 57.298922: perf_event_nmi_handler: NMI-handled(0): 5739 0 0
<...>-2155 [000] 57.298923: perf_event_nmi_handler: NMI: 5739 0 0
<...>-2155 [000] 57.298924: x86_pmu_handle_irq: OVERFLOW: 0
<...>-2155 [000] 57.298925: x86_pmu_handle_irq: OVERFLOW: 2
<...>-2155 [000] 57.298925: x86_pmu_handle_irq: OVERFLOW: 3
<...>-2155 [000] 57.298926: perf_event_nmi_handler: NMI-handled(0): 5739 0 0
<...>-2155 [000] 57.298927: perf_event_nmi_handler: NMI: 5739 0 0
<...>-2155 [000] 57.298928: perf_event_nmi_handler: NMI-fail
Which suggests that 5738 was a good NMI, 5739 is an unhandled NMI, we
see it twice, once through DIE_NMI once through DIE_NMIUNKNOWN.
The problem seems to be that we don't tag it because handled isn't
larger than 1.
Its easy to reproduce on my opteron, simply run:
pref record -fg ./hackbench 50
a few times.
I'll try and reset the PMU on init to clear some of those OVERFLOW msgs
(hmm, maybe also read the ctrl word and check EN and INT before doing
the overflow check)..
---
arch/x86/kernel/cpu/perf_event.c | 34 +++++++++++++++++++++++++++++++++-
1 files changed, 33 insertions(+), 1 deletions(-)
diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
index de6569c..9aff608 100644
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -1127,6 +1127,15 @@ static void x86_pmu_disable(struct perf_event *event)
perf_event_update_userpage(event);
}
+static int pmc_overflow(int idx)
+{
+ u64 val;
+
+ rdmsrl(x86_pmu.perfctr + idx, val);
+
+ return !!(val & (1ULL << (x86_pmu.cntval_bits - 1)));
+}
+
static int x86_pmu_handle_irq(struct pt_regs *regs)
{
struct perf_sample_data data;
@@ -1141,6 +1150,8 @@ static int x86_pmu_handle_irq(struct pt_regs *regs)
cpuc = &__get_cpu_var(cpu_hw_events);
for (idx = 0; idx < x86_pmu.num_counters; idx++) {
+ if (pmc_overflow(idx))
+ trace_printk("OVERFLOW: %d\n", idx);
if (!test_bit(idx, cpuc->active_mask))
continue;
@@ -1154,6 +1165,7 @@ static int x86_pmu_handle_irq(struct pt_regs *regs)
/*
* event overflow
*/
+ trace_printk("HANDLED: %d\n", idx);
handled++;
data.period = event->hw.last_period;
@@ -1215,6 +1227,11 @@ perf_event_nmi_handler(struct notifier_block *self,
unsigned int this_nmi;
int handled;
+ trace_printk("NMI: %d %d %d\n",
+ percpu_read(irq_stat.__nmi_count),
+ __get_cpu_var(pmu_nmi).marked,
+ __get_cpu_var(pmu_nmi).handled);
+
if (!atomic_read(&active_events))
return NOTIFY_DONE;
@@ -1224,9 +1241,12 @@ perf_event_nmi_handler(struct notifier_block *self,
break;
case DIE_NMIUNKNOWN:
this_nmi = percpu_read(irq_stat.__nmi_count);
- if (this_nmi != __get_cpu_var(pmu_nmi).marked)
+ if (this_nmi != __get_cpu_var(pmu_nmi).marked) {
+ trace_printk("NMI-fail\n");
/* let the kernel handle the unknown nmi */
return NOTIFY_DONE;
+ }
+ trace_printk("NMI-consume\n");
/*
* This one is a PMU back-to-back nmi. Two events
* trigger 'simultaneously' raising two back-to-back
@@ -1242,6 +1262,13 @@ perf_event_nmi_handler(struct notifier_block *self,
apic_write(APIC_LVTPC, APIC_DM_NMI);
handled = x86_pmu.handle_irq(args->regs);
+
+ trace_printk("NMI-handled(%d): %d %d %d\n",
+ handled,
+ percpu_read(irq_stat.__nmi_count),
+ __get_cpu_var(pmu_nmi).marked,
+ __get_cpu_var(pmu_nmi).handled);
+
if (!handled)
return NOTIFY_DONE;
@@ -1264,6 +1291,11 @@ perf_event_nmi_handler(struct notifier_block *self,
__get_cpu_var(pmu_nmi).handled = handled;
}
+ trace_printk("NMI-stop: %d %d %d\n",
+ percpu_read(irq_stat.__nmi_count),
+ __get_cpu_var(pmu_nmi).marked,
+ __get_cpu_var(pmu_nmi).handled);
+
return NOTIFY_STOP;
}
next prev parent reply other threads:[~2010-09-10 13:35 UTC|newest]
Thread overview: 117+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-09-02 19:07 [PATCH 0/3 v2] nmi perf fixes Don Zickus
2010-09-02 19:07 ` [PATCH 1/3] perf, x86: Fix accidentally ack'ing a second event on intel perf counter Don Zickus
2010-09-02 19:26 ` Cyrill Gorcunov
2010-09-02 20:00 ` Don Zickus
2010-09-02 20:36 ` Cyrill Gorcunov
2010-09-03 7:10 ` [tip:perf/urgent] " tip-bot for Don Zickus
2010-09-03 7:39 ` Yinghai Lu
2010-09-03 15:00 ` Don Zickus
2010-09-03 17:15 ` Yinghai Lu
2010-09-03 18:35 ` Don Zickus
2010-09-03 19:24 ` Yinghai Lu
2010-09-03 20:10 ` Don Zickus
2010-10-04 23:24 ` Yinghai Lu
2010-10-11 20:25 ` Don Zickus
2010-09-02 19:07 ` [PATCH 2/3] perf, x86: Try to handle unknown nmis with an enabled PMU Don Zickus
2010-09-03 7:11 ` [tip:perf/urgent] " tip-bot for Robert Richter
2010-09-02 19:07 ` [PATCH 3/3] perf, x86: Fix handle_irq return values Don Zickus
2010-09-03 7:10 ` [tip:perf/urgent] " tip-bot for Peter Zijlstra
2010-09-10 11:41 ` [PATCH 0/3 v2] nmi perf fixes Peter Zijlstra
2010-09-10 12:10 ` Stephane Eranian
2010-09-10 12:13 ` Stephane Eranian
2010-09-10 13:27 ` Don Zickus
2010-09-10 14:46 ` Ingo Molnar
2010-09-10 15:17 ` Robert Richter
2010-09-10 15:58 ` Peter Zijlstra
2010-09-10 16:41 ` Ingo Molnar
2010-09-10 16:42 ` Ingo Molnar
2010-09-10 16:37 ` Ingo Molnar
2010-09-10 16:51 ` Ingo Molnar
2010-09-10 15:56 ` [PATCH] x86: fix duplicate calls of the nmi handler Robert Richter
2010-09-10 16:15 ` Peter Zijlstra
2010-09-11 9:41 ` Ingo Molnar
2010-09-11 11:44 ` Robert Richter
2010-09-11 12:45 ` Ingo Molnar
2010-09-12 9:52 ` Robert Richter
2010-09-13 14:37 ` Robert Richter
2010-09-14 17:41 ` Robert Richter
2010-09-15 16:20 ` [PATCH] perf, x86: catch spurious interrupts after disabling counters Robert Richter
2010-09-15 16:36 ` Stephane Eranian
2010-09-15 17:00 ` Robert Richter
2010-09-15 17:32 ` Stephane Eranian
2010-09-15 18:44 ` Robert Richter
2010-09-15 19:34 ` Cyrill Gorcunov
2010-09-15 20:21 ` Stephane Eranian
2010-09-15 20:39 ` Cyrill Gorcunov
2010-09-15 22:27 ` Robert Richter
2010-09-16 14:51 ` Frederic Weisbecker
2010-09-15 16:46 ` Cyrill Gorcunov
2010-09-15 16:47 ` Stephane Eranian
2010-09-15 17:02 ` Cyrill Gorcunov
2010-09-15 17:28 ` Robert Richter
2010-09-15 17:40 ` Cyrill Gorcunov
2010-09-15 22:10 ` Robert Richter
2010-09-16 6:53 ` Cyrill Gorcunov
2010-09-16 17:34 ` Peter Zijlstra
2010-09-17 8:51 ` Robert Richter
2010-09-17 9:14 ` Peter Zijlstra
2010-09-17 13:06 ` Stephane Eranian
2010-09-20 8:41 ` Robert Richter
2010-09-24 0:02 ` Don Zickus
2010-09-24 3:18 ` Don Zickus
2010-09-24 10:03 ` Robert Richter
2010-09-24 13:38 ` Stephane Eranian
2010-09-30 12:33 ` Peter Zijlstra
2010-09-24 18:11 ` Don Zickus
2010-09-24 10:41 ` [tip:perf/urgent] perf, x86: Catch " tip-bot for Robert Richter
2010-09-29 12:26 ` Stephane Eranian
2010-09-29 12:53 ` Robert Richter
2010-09-29 12:54 ` Robert Richter
2010-09-29 13:13 ` Stephane Eranian
2010-09-29 13:28 ` Stephane Eranian
2010-09-29 15:01 ` Robert Richter
2010-09-29 15:12 ` Robert Richter
2010-09-29 15:27 ` Cyrill Gorcunov
2010-09-29 15:33 ` Stephane Eranian
2010-09-29 15:45 ` Cyrill Gorcunov
2010-09-29 15:51 ` Cyrill Gorcunov
2010-09-29 16:32 ` Robert Richter
2010-09-29 16:48 ` Cyrill Gorcunov
2010-09-29 16:00 ` Stephane Eranian
2010-09-29 17:09 ` Robert Richter
2010-09-29 17:41 ` Cyrill Gorcunov
2010-09-29 18:12 ` Don Zickus
2010-09-29 19:42 ` Stephane Eranian
2010-09-29 20:03 ` Don Zickus
2010-09-30 9:12 ` Robert Richter
2010-09-30 19:44 ` Don Zickus
2010-10-01 7:17 ` Robert Richter
[not found] ` <AANLkTimUyLaVaBigjm0-CwRsdh4UXWDiss2ffX53S+k_@mail.gmail.com>
2010-10-01 11:53 ` Stephane Eranian
2010-10-02 9:35 ` Robert Richter
2010-10-04 8:53 ` Stephane Eranian
2010-10-04 9:07 ` Andi Kleen
2010-10-04 17:28 ` Stephane Eranian
2010-09-29 16:31 ` Robert Richter
2010-09-29 16:22 ` Robert Richter
2010-09-29 19:01 ` Don Zickus
2010-09-29 13:39 ` Robert Richter
2010-09-29 13:56 ` Stephane Eranian
2010-09-29 14:00 ` Stephane Eranian
2010-10-02 9:50 ` Robert Richter
2010-10-02 17:40 ` Stephane Eranian
2010-09-29 15:02 ` Cyrill Gorcunov
2010-09-16 17:42 ` [PATCH] x86: fix duplicate calls of the nmi handler Peter Zijlstra
2010-09-16 20:18 ` Stephane Eranian
2010-09-17 7:09 ` Peter Zijlstra
2010-09-17 0:13 ` Huang Ying
2010-09-17 7:52 ` Peter Zijlstra
2010-09-17 8:13 ` Robert Richter
2010-09-17 8:37 ` Cyrill Gorcunov
2010-09-17 8:47 ` Huang Ying
2010-09-10 13:34 ` Peter Zijlstra [this message]
2010-09-10 13:52 ` [PATCH 0/3 v2] nmi perf fixes Peter Zijlstra
2010-09-13 8:55 ` Cyrill Gorcunov
2010-09-13 9:54 ` Stephane Eranian
2010-09-13 10:07 ` Cyrill Gorcunov
2010-09-13 10:10 ` Stephane Eranian
2010-09-13 10:12 ` Cyrill Gorcunov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1284125693.402.58.camel@laptop \
--to=peterz@infradead.org \
--cc=andi@firstfloor.org \
--cc=dzickus@redhat.com \
--cc=eranian@google.com \
--cc=fweisbec@gmail.com \
--cc=gorcunov@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=ming.m.lin@intel.com \
--cc=mingo@elte.hu \
--cc=robert.richter@amd.com \
--cc=ying.huang@intel.com \
--cc=yinghai@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox