* [PATCH -tip] perf, x86: fix unknown NMIs on a Pentium4 box
@ 2011-04-14 14:48 Cyrill Gorcunov
2011-04-14 15:03 ` Ingo Molnar
2011-04-14 17:43 ` Ingo Molnar
0 siblings, 2 replies; 18+ messages in thread
From: Cyrill Gorcunov @ 2011-04-14 14:48 UTC (permalink / raw)
To: Ingo Molnar
Cc: Lin Ming, Don Zickus, Shaun Ruffell, Maciej Rutecki,
Peter Zijlstra, Stephane Eranian, Robert Richter, lkml
From: Don Zickus <dzickus@redhat.com>
Subject: [PATCH -tip] perf, x86: fix unknown NMIs on a Pentium4 box
When using perf on a Pentium4 box, lots of unknown NMIs would be generated.
This is the result of a P4 quirk that is subtle. The P4 generates an NMI
when the counter overflow and unlike other arches where the NMI is a one time
event, the P4 continues to assert its NMI until clear by the OS.
As a side effect to this quirk, the NMI on the apic is masked off to prevent
a stream of NMIs until the overflow flag is cleared. During the perf
re-design, this subtle-ness was overlooked and the apic was unmasked _before_
the overflow flag was cleared. As a result, this generated an extra NMI on
the P4 mchines.
The fix is trivial, wait until the NMI is properly handled before un-masking
the apic.
Sadly, in the old nmi watchdog there was a note that explained this exact
behaviour.
Cyrill Gorcunov: Added a comment into code itself. We should consider
if we need to unmask LVTPC if no oveflow happened at all.
Ingo Molnar: Pointed out that unmasking unconditionally is proven by time
to be correct.
Signed-off-by: Don Zickus <dzickus@redhat.com>
Tested-by: Shaun Ruffell <sruffell@digium.com>
Tested-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
CC: Lin Ming <ming.m.lin@intel.com>
CC: Maciej Rutecki <maciej.rutecki@gmail.com>
CC: Peter Zijlstra <a.p.zijlstra@chello.nl>
CC: Stephane Eranian <eranian@google.com>
CC: Robert Richter <robert.richter@amd.com>
---
Ingo, please make sure I've added conform notes about conditional/uconditional
unmasking in changelog. Don, I've added a comment in code just to not forget why
we need it. Thanks.
arch/x86/kernel/cpu/perf_event.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
Index: linux-2.6.git/arch/x86/kernel/cpu/perf_event.c
=====================================================================
--- linux-2.6.git.orig/arch/x86/kernel/cpu/perf_event.c
+++ linux-2.6.git/arch/x86/kernel/cpu/perf_event.c
@@ -1370,9 +1370,16 @@ perf_event_nmi_handler(struct notifier_b
return NOTIFY_DONE;
}
- apic_write(APIC_LVTPC, APIC_DM_NMI);
handled = x86_pmu.handle_irq(args->regs);
+
+ /*
+ * Note the unmasking of LVTPC entry must be
+ * done *after* counter oveflow flag is cleared
+ * otherwise it might lead to double NMIs generation.
+ */
+ apic_write(APIC_LVTPC, APIC_DM_NMI);
+
if (!handled)
return NOTIFY_DONE;
--
Cyrill
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH -tip] perf, x86: fix unknown NMIs on a Pentium4 box
2011-04-14 14:48 [PATCH -tip] perf, x86: fix unknown NMIs on a Pentium4 box Cyrill Gorcunov
@ 2011-04-14 15:03 ` Ingo Molnar
2011-04-14 15:06 ` Cyrill Gorcunov
2011-04-14 17:43 ` Ingo Molnar
1 sibling, 1 reply; 18+ messages in thread
From: Ingo Molnar @ 2011-04-14 15:03 UTC (permalink / raw)
To: Cyrill Gorcunov
Cc: Lin Ming, Don Zickus, Shaun Ruffell, Maciej Rutecki,
Peter Zijlstra, Stephane Eranian, Robert Richter, lkml
Cyrill,
* Cyrill Gorcunov <gorcunov@openvz.org> wrote:
> From: Don Zickus <dzickus@redhat.com>
> Subject: [PATCH -tip] perf, x86: fix unknown NMIs on a Pentium4 box
...
> Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
When forwarding patches please always use Signed-off-by, not Acked-by -
especially when you also edit the patch slightly.
Is it fine to you if i add your SOB to this patch?
Thanks,
Ingo
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH -tip] perf, x86: fix unknown NMIs on a Pentium4 box
2011-04-14 15:03 ` Ingo Molnar
@ 2011-04-14 15:06 ` Cyrill Gorcunov
0 siblings, 0 replies; 18+ messages in thread
From: Cyrill Gorcunov @ 2011-04-14 15:06 UTC (permalink / raw)
To: Ingo Molnar
Cc: Lin Ming, Don Zickus, Shaun Ruffell, Maciej Rutecki,
Peter Zijlstra, Stephane Eranian, Robert Richter, lkml
On 04/14/2011 07:03 PM, Ingo Molnar wrote:
>
> Cyrill,
>
> * Cyrill Gorcunov <gorcunov@openvz.org> wrote:
>
>> From: Don Zickus <dzickus@redhat.com>
>> Subject: [PATCH -tip] perf, x86: fix unknown NMIs on a Pentium4 box
> ...
>> Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
>
> When forwarding patches please always use Signed-off-by, not Acked-by -
> especially when you also edit the patch slightly.
>
> Is it fine to you if i add your SOB to this patch?
>
> Thanks,
>
> Ingo
Of course I'm fine! Sorry Ingo for inconvenience.
--
Cyrill
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH -tip] perf, x86: fix unknown NMIs on a Pentium4 box
2011-04-14 14:48 [PATCH -tip] perf, x86: fix unknown NMIs on a Pentium4 box Cyrill Gorcunov
2011-04-14 15:03 ` Ingo Molnar
@ 2011-04-14 17:43 ` Ingo Molnar
2011-04-14 17:44 ` Ingo Molnar
` (2 more replies)
1 sibling, 3 replies; 18+ messages in thread
From: Ingo Molnar @ 2011-04-14 17:43 UTC (permalink / raw)
To: Cyrill Gorcunov
Cc: Lin Ming, Don Zickus, Shaun Ruffell, Maciej Rutecki,
Peter Zijlstra, Stephane Eranian, Robert Richter, lkml
* Cyrill Gorcunov <gorcunov@openvz.org> wrote:
> --- linux-2.6.git.orig/arch/x86/kernel/cpu/perf_event.c
> +++ linux-2.6.git/arch/x86/kernel/cpu/perf_event.c
> @@ -1370,9 +1370,16 @@ perf_event_nmi_handler(struct notifier_b
> return NOTIFY_DONE;
> }
>
> - apic_write(APIC_LVTPC, APIC_DM_NMI);
>
> handled = x86_pmu.handle_irq(args->regs);
> +
> + /*
> + * Note the unmasking of LVTPC entry must be
> + * done *after* counter oveflow flag is cleared
> + * otherwise it might lead to double NMIs generation.
> + */
> + apic_write(APIC_LVTPC, APIC_DM_NMI);
> +
> if (!handled)
> return NOTIFY_DONE;
>
This breaks 'perf top' on Intel Nehalem and probably other CPUs. The NMI gets
stuck fast on all CPUs:
NMI: 16 6 3 3 3 3 3 3 3 3 3 3 3 3 4 5 Non-maskable interrupts
Thanks,
Ingo
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH -tip] perf, x86: fix unknown NMIs on a Pentium4 box
2011-04-14 17:43 ` Ingo Molnar
@ 2011-04-14 17:44 ` Ingo Molnar
2011-04-14 17:49 ` Cyrill Gorcunov
2011-04-14 17:46 ` Cyrill Gorcunov
2011-04-14 18:32 ` Don Zickus
2 siblings, 1 reply; 18+ messages in thread
From: Ingo Molnar @ 2011-04-14 17:44 UTC (permalink / raw)
To: Cyrill Gorcunov
Cc: Lin Ming, Don Zickus, Shaun Ruffell, Maciej Rutecki,
Peter Zijlstra, Stephane Eranian, Robert Richter, lkml
below is the commit which needs to be fixed - note that i improved the
changelog, please keep these fixes for future submissions.
Thanks,
Ingo
------------------>
>From 1b6556e28ea29c2e7fe0eb710b100db035278c32 Mon Sep 17 00:00:00 2001
From: Don Zickus <dzickus@redhat.com>
Date: Thu, 14 Apr 2011 18:48:48 +0400
Subject: [PATCH] perf, x86: Fix spurious 'unknown NMI' messages on Pentium4 systems
When using perf on a Pentium4 box, lots of unknown NMIs are
generated. This is the result of a P4 quirk that is subtle. The
P4 generates an NMI when the counter overflows and unlike other
models where the NMI is a one time event, the P4 continues to
assert its NMI until cleared by the OS.
As a side effect to this quirk, the NMI on the apic is masked
off to prevent a stream of NMIs until the overflow flag is
cleared. During the perf re-design, this subtle-ness was
overlooked and the apic was unmasked _before_ the overflow flag
was cleared. As a result, this generated an extra NMI on P4
machines.
The fix is trivial: wait until the NMI is properly handled
before un-masking the apic.
Tested-by: Shaun Ruffell <sruffell@digium.com>
Signed-off-by: Don Zickus <dzickus@redhat.com>
Cc: Lin Ming <ming.m.lin@intel.com>
Cc: Maciej Rutecki <maciej.rutecki@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@google.com>
Cc: Robert Richter <robert.richter@amd.com>
[ Added a comment into code itself as well. ]
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Link: http://lkml.kernel.org/r/4DA70950.3060102@openvz.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
arch/x86/kernel/cpu/perf_event.c | 9 ++++++++-
1 files changed, 8 insertions(+), 1 deletions(-)
diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
index eed3673a..d3a1902 100644
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -1370,9 +1370,16 @@ perf_event_nmi_handler(struct notifier_block *self,
return NOTIFY_DONE;
}
- apic_write(APIC_LVTPC, APIC_DM_NMI);
handled = x86_pmu.handle_irq(args->regs);
+
+ /*
+ * Note the unmasking of LVTPC entry must be
+ * done *after* counter oveflow flag is cleared
+ * otherwise it might lead to double NMIs generation.
+ */
+ apic_write(APIC_LVTPC, APIC_DM_NMI);
+
if (!handled)
return NOTIFY_DONE;
^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH -tip] perf, x86: fix unknown NMIs on a Pentium4 box
2011-04-14 17:43 ` Ingo Molnar
2011-04-14 17:44 ` Ingo Molnar
@ 2011-04-14 17:46 ` Cyrill Gorcunov
2011-04-14 18:32 ` Don Zickus
2 siblings, 0 replies; 18+ messages in thread
From: Cyrill Gorcunov @ 2011-04-14 17:46 UTC (permalink / raw)
To: Ingo Molnar
Cc: Lin Ming, Don Zickus, Shaun Ruffell, Maciej Rutecki,
Peter Zijlstra, Stephane Eranian, Robert Richter, lkml
On 04/14/2011 09:43 PM, Ingo Molnar wrote:
...
>>
>> handled = x86_pmu.handle_irq(args->regs);
>> +
>> + /*
>> + * Note the unmasking of LVTPC entry must be
>> + * done *after* counter oveflow flag is cleared
>> + * otherwise it might lead to double NMIs generation.
>> + */
>> + apic_write(APIC_LVTPC, APIC_DM_NMI);
>> +
>> if (!handled)
>> return NOTIFY_DONE;
>>
>
> This breaks 'perf top' on Intel Nehalem and probably other CPUs. The NMI gets
> stuck fast on all CPUs:
>
> NMI: 16 6 3 3 3 3 3 3 3 3 3 3 3 3 4 5 Non-maskable interrupts
>
> Thanks,
>
> Ingo
Hmm. Thanks for info. Investigating...
--
Cyrill
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH -tip] perf, x86: fix unknown NMIs on a Pentium4 box
2011-04-14 17:44 ` Ingo Molnar
@ 2011-04-14 17:49 ` Cyrill Gorcunov
2011-04-14 18:12 ` Shaun Ruffell
0 siblings, 1 reply; 18+ messages in thread
From: Cyrill Gorcunov @ 2011-04-14 17:49 UTC (permalink / raw)
To: Ingo Molnar
Cc: Lin Ming, Don Zickus, Shaun Ruffell, Maciej Rutecki,
Peter Zijlstra, Stephane Eranian, Robert Richter, lkml
On 04/14/2011 09:44 PM, Ingo Molnar wrote:
>
> below is the commit which needs to be fixed - note that i improved the
> changelog, please keep these fixes for future submissions.
>
> Thanks,
>
> Ingo
>
Yes, thanks, i'll ping back as only get nit resolved.
--
Cyrill
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH -tip] perf, x86: fix unknown NMIs on a Pentium4 box
2011-04-14 17:49 ` Cyrill Gorcunov
@ 2011-04-14 18:12 ` Shaun Ruffell
2011-04-14 18:14 ` Cyrill Gorcunov
2011-04-14 18:19 ` Cyrill Gorcunov
0 siblings, 2 replies; 18+ messages in thread
From: Shaun Ruffell @ 2011-04-14 18:12 UTC (permalink / raw)
To: Cyrill Gorcunov
Cc: Ingo Molnar, Lin Ming, Don Zickus, Maciej Rutecki, Peter Zijlstra,
Stephane Eranian, Robert Richter, lkml
On Thu, Apr 14, 2011 at 09:49:37PM +0400, Cyrill Gorcunov wrote:
> On 04/14/2011 09:44 PM, Ingo Molnar wrote:
>>
>> below is the commit which needs to be fixed - note that i improved the
>> changelog, please keep these fixes for future submissions.
>
> Yes, thanks, i'll ping back as only get nit resolved.
Cyrill, I had not been running perf when I was testing, but I was able to
reproduce what Ingo reported. I'll make sure to run that on any future tests
as well. My apologies.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH -tip] perf, x86: fix unknown NMIs on a Pentium4 box
2011-04-14 18:12 ` Shaun Ruffell
@ 2011-04-14 18:14 ` Cyrill Gorcunov
2011-04-14 18:19 ` Cyrill Gorcunov
1 sibling, 0 replies; 18+ messages in thread
From: Cyrill Gorcunov @ 2011-04-14 18:14 UTC (permalink / raw)
To: Shaun Ruffell
Cc: Ingo Molnar, Lin Ming, Don Zickus, Maciej Rutecki, Peter Zijlstra,
Stephane Eranian, Robert Richter, lkml
On 04/14/2011 10:12 PM, Shaun Ruffell wrote:
> On Thu, Apr 14, 2011 at 09:49:37PM +0400, Cyrill Gorcunov wrote:
>> On 04/14/2011 09:44 PM, Ingo Molnar wrote:
>>>
>>> below is the commit which needs to be fixed - note that i improved the
>>> changelog, please keep these fixes for future submissions.
>>
>> Yes, thanks, i'll ping back as only get nit resolved.
>
> Cyrill, I had not been running perf when I was testing, but I was able to
> reproduce what Ingo reported. I'll make sure to run that on any future tests
> as well. My apologies.
It's mine error, no need for apologies. I suspect this issue might be related
to shared MSRs (iirc nehalem have some MSRs shared as well). I'm checking which
one might be involved.
--
Cyrill
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH -tip] perf, x86: fix unknown NMIs on a Pentium4 box
2011-04-14 18:12 ` Shaun Ruffell
2011-04-14 18:14 ` Cyrill Gorcunov
@ 2011-04-14 18:19 ` Cyrill Gorcunov
2011-04-14 19:35 ` David Ahern
1 sibling, 1 reply; 18+ messages in thread
From: Cyrill Gorcunov @ 2011-04-14 18:19 UTC (permalink / raw)
To: Shaun Ruffell
Cc: Ingo Molnar, Lin Ming, Don Zickus, Maciej Rutecki, Peter Zijlstra,
Stephane Eranian, Robert Richter, lkml
On 04/14/2011 10:12 PM, Shaun Ruffell wrote:
> On Thu, Apr 14, 2011 at 09:49:37PM +0400, Cyrill Gorcunov wrote:
>> On 04/14/2011 09:44 PM, Ingo Molnar wrote:
>>>
>>> below is the commit which needs to be fixed - note that i improved the
>>> changelog, please keep these fixes for future submissions.
>>
>> Yes, thanks, i'll ping back as only get nit resolved.
>
> Cyrill, I had not been running perf when I was testing, but I was able to
> reproduce what Ingo reported. I'll make sure to run that on any future tests
> as well. My apologies.
Btw, Shaun, perf top wont work on P4 for a while (since it needs nmi-watchdog
to be disabled).
--
Cyrill
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH -tip] perf, x86: fix unknown NMIs on a Pentium4 box
2011-04-14 17:43 ` Ingo Molnar
2011-04-14 17:44 ` Ingo Molnar
2011-04-14 17:46 ` Cyrill Gorcunov
@ 2011-04-14 18:32 ` Don Zickus
2011-04-14 18:45 ` Ingo Molnar
2011-04-14 18:46 ` Ingo Molnar
2 siblings, 2 replies; 18+ messages in thread
From: Don Zickus @ 2011-04-14 18:32 UTC (permalink / raw)
To: Ingo Molnar
Cc: Cyrill Gorcunov, Lin Ming, Shaun Ruffell, Maciej Rutecki,
Peter Zijlstra, Stephane Eranian, Robert Richter, lkml
On Thu, Apr 14, 2011 at 07:43:27PM +0200, Ingo Molnar wrote:
>
> * Cyrill Gorcunov <gorcunov@openvz.org> wrote:
>
> > --- linux-2.6.git.orig/arch/x86/kernel/cpu/perf_event.c
> > +++ linux-2.6.git/arch/x86/kernel/cpu/perf_event.c
> > @@ -1370,9 +1370,16 @@ perf_event_nmi_handler(struct notifier_b
> > return NOTIFY_DONE;
> > }
> >
> > - apic_write(APIC_LVTPC, APIC_DM_NMI);
> >
> > handled = x86_pmu.handle_irq(args->regs);
> > +
> > + /*
> > + * Note the unmasking of LVTPC entry must be
> > + * done *after* counter oveflow flag is cleared
> > + * otherwise it might lead to double NMIs generation.
> > + */
> > + apic_write(APIC_LVTPC, APIC_DM_NMI);
> > +
> > if (!handled)
> > return NOTIFY_DONE;
> >
>
> This breaks 'perf top' on Intel Nehalem and probably other CPUs. The NMI gets
> stuck fast on all CPUs:
>
> NMI: 16 6 3 3 3 3 3 3 3 3 3 3 3 3 4 5 Non-maskable interrupts
Damn it, I was working on getting there. First I did P4s, now I was
working on acme's core2 issues. Nehalem was next on my list, I swear! :-)))))
So this sucks. I'll grab a Nehalem and see what went wrong. It's
probably because of the other 'this seems to work' hacks I put in that
handler. I bet if I clean those up, this problem will be fixed.
I will note that using my patch on a core2quad system, lowered the number
of back-to-back NMIs I was seeing when running a couple of perf records
and a make -j8 (still generates unknown NMIs though :-( ).
Cheers,
Don
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH -tip] perf, x86: fix unknown NMIs on a Pentium4 box
2011-04-14 18:32 ` Don Zickus
@ 2011-04-14 18:45 ` Ingo Molnar
2011-04-14 18:46 ` Ingo Molnar
1 sibling, 0 replies; 18+ messages in thread
From: Ingo Molnar @ 2011-04-14 18:45 UTC (permalink / raw)
To: Don Zickus
Cc: Cyrill Gorcunov, Lin Ming, Shaun Ruffell, Maciej Rutecki,
Peter Zijlstra, Stephane Eranian, Robert Richter, lkml
* Don Zickus <dzickus@redhat.com> wrote:
> On Thu, Apr 14, 2011 at 07:43:27PM +0200, Ingo Molnar wrote:
> >
> > * Cyrill Gorcunov <gorcunov@openvz.org> wrote:
> >
> > > --- linux-2.6.git.orig/arch/x86/kernel/cpu/perf_event.c
> > > +++ linux-2.6.git/arch/x86/kernel/cpu/perf_event.c
> > > @@ -1370,9 +1370,16 @@ perf_event_nmi_handler(struct notifier_b
> > > return NOTIFY_DONE;
> > > }
> > >
> > > - apic_write(APIC_LVTPC, APIC_DM_NMI);
> > >
> > > handled = x86_pmu.handle_irq(args->regs);
> > > +
> > > + /*
> > > + * Note the unmasking of LVTPC entry must be
> > > + * done *after* counter oveflow flag is cleared
> > > + * otherwise it might lead to double NMIs generation.
> > > + */
> > > + apic_write(APIC_LVTPC, APIC_DM_NMI);
> > > +
> > > if (!handled)
> > > return NOTIFY_DONE;
> > >
> >
> > This breaks 'perf top' on Intel Nehalem and probably other CPUs. The NMI gets
> > stuck fast on all CPUs:
> >
> > NMI: 16 6 3 3 3 3 3 3 3 3 3 3 3 3 4 5 Non-maskable interrupts
>
> Damn it, I was working on getting there. First I did P4s, now I was working
> on acme's core2 issues. Nehalem was next on my list, I swear! :-)))))
>
> So this sucks. I'll grab a Nehalem and see what went wrong. It's probably
> because of the other 'this seems to work' hacks I put in that handler. I bet
> if I clean those up, this problem will be fixed.
>
> I will note that using my patch on a core2quad system, lowered the number of
> back-to-back NMIs I was seeing when running a couple of perf records and a
> make -j8 (still generates unknown NMIs though :-( ).
Here's the cpuinfo:
processor : 15
vendor_id : GenuineIntel
cpu family : 6
model : 26
model name : Intel(R) Xeon(R) CPU X55600 @ 2.80GHz
stepping : 5
cpu MHz : 2794.000
cache size : 8192 KB
physical id : 1
siblings : 8
core id : 3
cpu cores : 4
apicid : 23
initial apicid : 23
fpu : yes
fpu_exception : yes
cpuid level : 11
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 sse4_2 popcnt lahf_lm ida dts tpr_shadow vnmi flexpriority ept vpid
bogomips : 5599.19
clflush size : 64
cache_alignment : 64
address sizes : 40 bits physical, 48 bits virtual
power management:
just in case you have trouble reproducing the problem.
Thanks,
Ingo
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH -tip] perf, x86: fix unknown NMIs on a Pentium4 box
2011-04-14 18:32 ` Don Zickus
2011-04-14 18:45 ` Ingo Molnar
@ 2011-04-14 18:46 ` Ingo Molnar
2011-04-14 19:43 ` Cyrill Gorcunov
1 sibling, 1 reply; 18+ messages in thread
From: Ingo Molnar @ 2011-04-14 18:46 UTC (permalink / raw)
To: Don Zickus
Cc: Cyrill Gorcunov, Lin Ming, Shaun Ruffell, Maciej Rutecki,
Peter Zijlstra, Stephane Eranian, Robert Richter, lkml
btw., the bug went away once i removed your patch so it's 100% sure caused by
this change.
Thanks,
Ingo
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH -tip] perf, x86: fix unknown NMIs on a Pentium4 box
2011-04-14 18:19 ` Cyrill Gorcunov
@ 2011-04-14 19:35 ` David Ahern
0 siblings, 0 replies; 18+ messages in thread
From: David Ahern @ 2011-04-14 19:35 UTC (permalink / raw)
To: Cyrill Gorcunov
Cc: Shaun Ruffell, Ingo Molnar, Lin Ming, Don Zickus, Maciej Rutecki,
Peter Zijlstra, Stephane Eranian, Robert Richter, lkml
On 04/14/11 12:19, Cyrill Gorcunov wrote:
> On 04/14/2011 10:12 PM, Shaun Ruffell wrote:
> Btw, Shaun, perf top wont work on P4 for a while (since it needs nmi-watchdog
> to be disabled).
>
perf-top defaults to H/W cycles event. Just need to force it to S/W
events (the fallback for systems that do not support cycles event):
perf top -e cpu-clock
David
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH -tip] perf, x86: fix unknown NMIs on a Pentium4 box
2011-04-14 18:46 ` Ingo Molnar
@ 2011-04-14 19:43 ` Cyrill Gorcunov
2011-04-14 19:57 ` Don Zickus
0 siblings, 1 reply; 18+ messages in thread
From: Cyrill Gorcunov @ 2011-04-14 19:43 UTC (permalink / raw)
To: Ingo Molnar
Cc: Don Zickus, Lin Ming, Shaun Ruffell, Maciej Rutecki,
Peter Zijlstra, Stephane Eranian, Robert Richter, lkml
On 04/14/2011 10:46 PM, Ingo Molnar wrote:
>
> btw., the bug went away once i removed your patch so it's 100% sure caused by
> this change.
>
> Thanks,
>
> Ingo
Ingo if you have a chance mind to give this patch a shot please? Seems we might miss unmasking
for inflight nmis.
--
Index: linux-2.6.git/arch/x86/kernel/cpu/perf_event.c
=====================================================================
--- linux-2.6.git.orig/arch/x86/kernel/cpu/perf_event.c
+++ linux-2.6.git/arch/x86/kernel/cpu/perf_event.c
@@ -1365,14 +1365,15 @@ perf_event_nmi_handler(struct notifier_b
* will be empty and daze the CPU. So, we drop it to
* avoid false-positive 'unknown nmi' messages.
*/
+ apic_write(APIC_LVTPC, APIC_DM_NMI);
return NOTIFY_STOP;
default:
return NOTIFY_DONE;
}
- apic_write(APIC_LVTPC, APIC_DM_NMI);
handled = x86_pmu.handle_irq(args->regs);
+ apic_write(APIC_LVTPC, APIC_DM_NMI);
if (!handled)
return NOTIFY_DONE;
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH -tip] perf, x86: fix unknown NMIs on a Pentium4 box
2011-04-14 19:43 ` Cyrill Gorcunov
@ 2011-04-14 19:57 ` Don Zickus
2011-04-14 20:05 ` Cyrill Gorcunov
0 siblings, 1 reply; 18+ messages in thread
From: Don Zickus @ 2011-04-14 19:57 UTC (permalink / raw)
To: Cyrill Gorcunov
Cc: Ingo Molnar, Lin Ming, Shaun Ruffell, Maciej Rutecki,
Peter Zijlstra, Stephane Eranian, Robert Richter, lkml
On Thu, Apr 14, 2011 at 11:43:25PM +0400, Cyrill Gorcunov wrote:
> On 04/14/2011 10:46 PM, Ingo Molnar wrote:
> >
> > btw., the bug went away once i removed your patch so it's 100% sure caused by
> > this change.
> >
> > Thanks,
> >
> > Ingo
>
> Ingo if you have a chance mind to give this patch a shot please? Seems we might miss unmasking
> for inflight nmis.
I don't think this patch will work. It would make sense if the unmasking
happened _after_ the "if (!handled)" path, but that is not the path Ingo
wanted for v1.
Cheers,
Don
> - apic_write(APIC_LVTPC, APIC_DM_NMI);
>
> handled = x86_pmu.handle_irq(args->regs);
> + apic_write(APIC_LVTPC, APIC_DM_NMI);
^^^^ all handled/unhandled NMIs hit that apic_write
> if (!handled)
> return NOTIFY_DONE;
>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH -tip] perf, x86: fix unknown NMIs on a Pentium4 box
2011-04-14 19:57 ` Don Zickus
@ 2011-04-14 20:05 ` Cyrill Gorcunov
2011-04-14 20:18 ` Cyrill Gorcunov
0 siblings, 1 reply; 18+ messages in thread
From: Cyrill Gorcunov @ 2011-04-14 20:05 UTC (permalink / raw)
To: Don Zickus
Cc: Ingo Molnar, Lin Ming, Shaun Ruffell, Maciej Rutecki,
Peter Zijlstra, Stephane Eranian, Robert Richter, lkml
On 04/14/2011 11:57 PM, Don Zickus wrote:
> On Thu, Apr 14, 2011 at 11:43:25PM +0400, Cyrill Gorcunov wrote:
>> On 04/14/2011 10:46 PM, Ingo Molnar wrote:
>>>
>>> btw., the bug went away once i removed your patch so it's 100% sure caused by
>>> this change.
>>>
>>> Thanks,
>>>
>>> Ingo
>>
>> Ingo if you have a chance mind to give this patch a shot please? Seems we might miss unmasking
>> for inflight nmis.
>
> I don't think this patch will work. It would make sense if the unmasking
> happened _after_ the "if (!handled)" path, but that is not the path Ingo
> wanted for v1.
This thing happened if inflight nmi reaches the system and note that inflight
NMI comes from perf and masks lvt entry, it has nothing to do with "handled" but
rather the _fact_ that NMI reached apic via LVTPC and as result -- masked it.
Don, I might be missin something, brain is slowly going to sleep :)
>
> Cheers,
> Don
>
>> - apic_write(APIC_LVTPC, APIC_DM_NMI);
>>
>> handled = x86_pmu.handle_irq(args->regs);
>> + apic_write(APIC_LVTPC, APIC_DM_NMI);
>
> ^^^^ all handled/unhandled NMIs hit that apic_write
>> if (!handled)
>> return NOTIFY_DONE;
>>
yeah, Ingo asked to make it this way -- ie like in your former
patch, the conditional unmasking is left to be tested in further
kernel series.
--
Cyrill
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH -tip] perf, x86: fix unknown NMIs on a Pentium4 box
2011-04-14 20:05 ` Cyrill Gorcunov
@ 2011-04-14 20:18 ` Cyrill Gorcunov
0 siblings, 0 replies; 18+ messages in thread
From: Cyrill Gorcunov @ 2011-04-14 20:18 UTC (permalink / raw)
To: Don Zickus
Cc: Ingo Molnar, Lin Ming, Shaun Ruffell, Maciej Rutecki,
Peter Zijlstra, Stephane Eranian, Robert Richter, lkml
On 04/15/2011 12:05 AM, Cyrill Gorcunov wrote:
...
>>
>> I don't think this patch will work. It would make sense if the unmasking
>> happened _after_ the "if (!handled)" path, but that is not the path Ingo
>> wanted for v1.
>
> This thing happened if inflight nmi reaches the system and note that inflight
> NMI comes from perf and masks lvt entry, it has nothing to do with "handled" but
> rather the _fact_ that NMI reached apic via LVTPC and as result -- masked it.
> Don, I might be missin something, brain is slowly going to sleep :)
>
Seems I'm wrong in this assumption, otherwise when this mechanism was
introduced for first time every in-flight nmi catched would block further
perf activity.
>>
>> Cheers,
>> Don
>>
--
Cyrill
^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2011-04-14 20:18 UTC | newest]
Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-04-14 14:48 [PATCH -tip] perf, x86: fix unknown NMIs on a Pentium4 box Cyrill Gorcunov
2011-04-14 15:03 ` Ingo Molnar
2011-04-14 15:06 ` Cyrill Gorcunov
2011-04-14 17:43 ` Ingo Molnar
2011-04-14 17:44 ` Ingo Molnar
2011-04-14 17:49 ` Cyrill Gorcunov
2011-04-14 18:12 ` Shaun Ruffell
2011-04-14 18:14 ` Cyrill Gorcunov
2011-04-14 18:19 ` Cyrill Gorcunov
2011-04-14 19:35 ` David Ahern
2011-04-14 17:46 ` Cyrill Gorcunov
2011-04-14 18:32 ` Don Zickus
2011-04-14 18:45 ` Ingo Molnar
2011-04-14 18:46 ` Ingo Molnar
2011-04-14 19:43 ` Cyrill Gorcunov
2011-04-14 19:57 ` Don Zickus
2011-04-14 20:05 ` Cyrill Gorcunov
2011-04-14 20:18 ` Cyrill Gorcunov
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox