From: Don Zickus <dzickus@redhat.com>
To: Robert Richter <robert.richter@amd.com>
Cc: Ingo Molnar <mingo@elte.hu>,
Peter Zijlstra <peterz@infradead.org>,
"gorcunov@gmail.com" <gorcunov@gmail.com>,
"fweisbec@gmail.com" <fweisbec@gmail.com>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"ying.huang@intel.com" <ying.huang@intel.com>,
"ming.m.lin@intel.com" <ming.m.lin@intel.com>,
"yinghai@kernel.org" <yinghai@kernel.org>,
"andi@firstfloor.org" <andi@firstfloor.org>,
"eranian@google.com" <eranian@google.com>
Subject: Re: [PATCH] perf, x86: catch spurious interrupts after disabling counters
Date: Thu, 23 Sep 2010 23:18:34 -0400 [thread overview]
Message-ID: <20100924031834.GP26290@redhat.com> (raw)
In-Reply-To: <20100924000234.GO26290@redhat.com>
On Thu, Sep 23, 2010 at 08:02:34PM -0400, Don Zickus wrote:
> On Wed, Sep 15, 2010 at 06:20:34PM +0200, Robert Richter wrote:
> > On 14.09.10 19:41:32, Robert Richter wrote:
> > > I found the reason why we get the unknown nmi. For some reason
> > > cpuc->active_mask in x86_pmu_handle_irq() is zero. Thus, no counters
> > > are handled when we get an nmi. It seems there is somewhere a race
> > > accessing the active_mask. So far I don't have a fix available.
> > > Changing x86_pmu_stop() did not help:
> >
> > The patch below for tip/perf/urgent fixes this.
> >
> > -Robert
>
> I was able to duplicate the problem and can confirm this patch fixes the
> issue for me. I tried poking around (similar to things Robert probably
> did) and had no luck. Something just doesn't make sense, but I guess for
> now this patch is good enough for me. :-)
Ah ha! I figured out what the problem was, need to disable the pmu while
processing the nmi. :-) Finally something simple in this crazy unknown
NMI spree.
Oh yeah and trace_printk is now my new favorite tool!
From: Don Zickus <dzickus@redhat.com>
Date: Thu, 23 Sep 2010 22:52:09 -0400
Subject: [PATCH] x86, perf: disable pmu from counting when processing irq
On certain AMD and Intel machines, the pmu was left enabled
while the counters were reset during handling of the NMI.
After the counters are reset, code continues to process an
overflow. During this time another counter overflow interrupt
could happen because the counter is still ticking. This leads to
an unknown NMI.
static int x86_pmu_handle_irq(struct pt_regs *regs)
{
<snip..>
for (idx = 0; idx < x86_pmu.num_counters; idx++) {
if (!test_bit(idx, cpuc->active_mask))
continue;
<snip..>
counter reset--> if (!x86_perf_event_set_period(event))
continue;
still ticking--> if (perf_event_overflow(event, 1, &data, regs))
<boom overflow>
stopped here --> x86_pmu_stop(event);
The way to solve this is to disable the pmu while processing the
overflows and re-enable when done.
Signed-off-by: Don Zickus <dzickus@redhat.com>
---
arch/x86/kernel/cpu/perf_event.c | 4 ++++
1 files changed, 4 insertions(+), 0 deletions(-)
diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
index 48c6d8d..d4fe95d 100644
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -1158,6 +1158,8 @@ static int x86_pmu_handle_irq(struct pt_regs *regs)
cpuc = &__get_cpu_var(cpu_hw_events);
+ x86_pmu_disable_all();
+
for (idx = 0; idx < x86_pmu.num_counters; idx++) {
if (!test_bit(idx, cpuc->active_mask))
continue;
@@ -1182,6 +1184,8 @@ static int x86_pmu_handle_irq(struct pt_regs *regs)
x86_pmu_stop(event, 0);
}
+ x86_pmu_enable_all(0);
+
if (handled)
inc_irq_stat(apic_perf_irqs);
--
1.7.2.3
next prev parent reply other threads:[~2010-09-24 3:19 UTC|newest]
Thread overview: 117+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-09-02 19:07 [PATCH 0/3 v2] nmi perf fixes Don Zickus
2010-09-02 19:07 ` [PATCH 1/3] perf, x86: Fix accidentally ack'ing a second event on intel perf counter Don Zickus
2010-09-02 19:26 ` Cyrill Gorcunov
2010-09-02 20:00 ` Don Zickus
2010-09-02 20:36 ` Cyrill Gorcunov
2010-09-03 7:10 ` [tip:perf/urgent] " tip-bot for Don Zickus
2010-09-03 7:39 ` Yinghai Lu
2010-09-03 15:00 ` Don Zickus
2010-09-03 17:15 ` Yinghai Lu
2010-09-03 18:35 ` Don Zickus
2010-09-03 19:24 ` Yinghai Lu
2010-09-03 20:10 ` Don Zickus
2010-10-04 23:24 ` Yinghai Lu
2010-10-11 20:25 ` Don Zickus
2010-09-02 19:07 ` [PATCH 2/3] perf, x86: Try to handle unknown nmis with an enabled PMU Don Zickus
2010-09-03 7:11 ` [tip:perf/urgent] " tip-bot for Robert Richter
2010-09-02 19:07 ` [PATCH 3/3] perf, x86: Fix handle_irq return values Don Zickus
2010-09-03 7:10 ` [tip:perf/urgent] " tip-bot for Peter Zijlstra
2010-09-10 11:41 ` [PATCH 0/3 v2] nmi perf fixes Peter Zijlstra
2010-09-10 12:10 ` Stephane Eranian
2010-09-10 12:13 ` Stephane Eranian
2010-09-10 13:27 ` Don Zickus
2010-09-10 14:46 ` Ingo Molnar
2010-09-10 15:17 ` Robert Richter
2010-09-10 15:58 ` Peter Zijlstra
2010-09-10 16:41 ` Ingo Molnar
2010-09-10 16:42 ` Ingo Molnar
2010-09-10 16:37 ` Ingo Molnar
2010-09-10 16:51 ` Ingo Molnar
2010-09-10 15:56 ` [PATCH] x86: fix duplicate calls of the nmi handler Robert Richter
2010-09-10 16:15 ` Peter Zijlstra
2010-09-11 9:41 ` Ingo Molnar
2010-09-11 11:44 ` Robert Richter
2010-09-11 12:45 ` Ingo Molnar
2010-09-12 9:52 ` Robert Richter
2010-09-13 14:37 ` Robert Richter
2010-09-14 17:41 ` Robert Richter
2010-09-15 16:20 ` [PATCH] perf, x86: catch spurious interrupts after disabling counters Robert Richter
2010-09-15 16:36 ` Stephane Eranian
2010-09-15 17:00 ` Robert Richter
2010-09-15 17:32 ` Stephane Eranian
2010-09-15 18:44 ` Robert Richter
2010-09-15 19:34 ` Cyrill Gorcunov
2010-09-15 20:21 ` Stephane Eranian
2010-09-15 20:39 ` Cyrill Gorcunov
2010-09-15 22:27 ` Robert Richter
2010-09-16 14:51 ` Frederic Weisbecker
2010-09-15 16:46 ` Cyrill Gorcunov
2010-09-15 16:47 ` Stephane Eranian
2010-09-15 17:02 ` Cyrill Gorcunov
2010-09-15 17:28 ` Robert Richter
2010-09-15 17:40 ` Cyrill Gorcunov
2010-09-15 22:10 ` Robert Richter
2010-09-16 6:53 ` Cyrill Gorcunov
2010-09-16 17:34 ` Peter Zijlstra
2010-09-17 8:51 ` Robert Richter
2010-09-17 9:14 ` Peter Zijlstra
2010-09-17 13:06 ` Stephane Eranian
2010-09-20 8:41 ` Robert Richter
2010-09-24 0:02 ` Don Zickus
2010-09-24 3:18 ` Don Zickus [this message]
2010-09-24 10:03 ` Robert Richter
2010-09-24 13:38 ` Stephane Eranian
2010-09-30 12:33 ` Peter Zijlstra
2010-09-24 18:11 ` Don Zickus
2010-09-24 10:41 ` [tip:perf/urgent] perf, x86: Catch " tip-bot for Robert Richter
2010-09-29 12:26 ` Stephane Eranian
2010-09-29 12:53 ` Robert Richter
2010-09-29 12:54 ` Robert Richter
2010-09-29 13:13 ` Stephane Eranian
2010-09-29 13:28 ` Stephane Eranian
2010-09-29 15:01 ` Robert Richter
2010-09-29 15:12 ` Robert Richter
2010-09-29 15:27 ` Cyrill Gorcunov
2010-09-29 15:33 ` Stephane Eranian
2010-09-29 15:45 ` Cyrill Gorcunov
2010-09-29 15:51 ` Cyrill Gorcunov
2010-09-29 16:32 ` Robert Richter
2010-09-29 16:48 ` Cyrill Gorcunov
2010-09-29 16:00 ` Stephane Eranian
2010-09-29 17:09 ` Robert Richter
2010-09-29 17:41 ` Cyrill Gorcunov
2010-09-29 18:12 ` Don Zickus
2010-09-29 19:42 ` Stephane Eranian
2010-09-29 20:03 ` Don Zickus
2010-09-30 9:12 ` Robert Richter
2010-09-30 19:44 ` Don Zickus
2010-10-01 7:17 ` Robert Richter
[not found] ` <AANLkTimUyLaVaBigjm0-CwRsdh4UXWDiss2ffX53S+k_@mail.gmail.com>
2010-10-01 11:53 ` Stephane Eranian
2010-10-02 9:35 ` Robert Richter
2010-10-04 8:53 ` Stephane Eranian
2010-10-04 9:07 ` Andi Kleen
2010-10-04 17:28 ` Stephane Eranian
2010-09-29 16:31 ` Robert Richter
2010-09-29 16:22 ` Robert Richter
2010-09-29 19:01 ` Don Zickus
2010-09-29 13:39 ` Robert Richter
2010-09-29 13:56 ` Stephane Eranian
2010-09-29 14:00 ` Stephane Eranian
2010-10-02 9:50 ` Robert Richter
2010-10-02 17:40 ` Stephane Eranian
2010-09-29 15:02 ` Cyrill Gorcunov
2010-09-16 17:42 ` [PATCH] x86: fix duplicate calls of the nmi handler Peter Zijlstra
2010-09-16 20:18 ` Stephane Eranian
2010-09-17 7:09 ` Peter Zijlstra
2010-09-17 0:13 ` Huang Ying
2010-09-17 7:52 ` Peter Zijlstra
2010-09-17 8:13 ` Robert Richter
2010-09-17 8:37 ` Cyrill Gorcunov
2010-09-17 8:47 ` Huang Ying
2010-09-10 13:34 ` [PATCH 0/3 v2] nmi perf fixes Peter Zijlstra
2010-09-10 13:52 ` Peter Zijlstra
2010-09-13 8:55 ` Cyrill Gorcunov
2010-09-13 9:54 ` Stephane Eranian
2010-09-13 10:07 ` Cyrill Gorcunov
2010-09-13 10:10 ` Stephane Eranian
2010-09-13 10:12 ` Cyrill Gorcunov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20100924031834.GP26290@redhat.com \
--to=dzickus@redhat.com \
--cc=andi@firstfloor.org \
--cc=eranian@google.com \
--cc=fweisbec@gmail.com \
--cc=gorcunov@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=ming.m.lin@intel.com \
--cc=mingo@elte.hu \
--cc=peterz@infradead.org \
--cc=robert.richter@amd.com \
--cc=ying.huang@intel.com \
--cc=yinghai@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).