* [PATCH 1/4] perf, x86: Try to handle unknown nmis with an enabled PMU
2010-09-01 2:56 [PATCH 0/4] nmi perf fixes Don Zickus
@ 2010-09-01 2:56 ` Don Zickus
2010-09-01 2:56 ` [PATCH 2/4] perf, x86: Fix handle_irq return values Don Zickus
` (3 subsequent siblings)
4 siblings, 0 replies; 6+ messages in thread
From: Don Zickus @ 2010-09-01 2:56 UTC (permalink / raw)
To: mingo
Cc: peterz, robert.richter, gorcunov, fweisbec, linux-kernel,
ying.huang, ming.m.lin, yinghai, andi, Peter Zijlstra
From: Robert Richter <robert.richter@amd.com>
When the PMU is enabled it is valid to have unhandled nmis, two
events could trigger 'simultaneously' raising two back-to-back
NMIs. If the first NMI handles both, the latter will be empty and daze
the CPU.
The solution to avoid an 'unknown nmi' massage in this case was simply
to stop the nmi handler chain when the PMU is enabled by stating
the nmi was handled. This has the drawback that a) we can not detect
unknown nmis anymore, and b) subsequent nmi handlers are not called.
This patch addresses this. Now, we check this unknown NMI if it could
be a PMU back-to-back NMI. Otherwise we pass it and let the kernel
handle the unknown nmi.
This is a debug log:
cpu #6, nmi #32333, skip_nmi #32330, handled = 1, time = 1934364430
cpu #6, nmi #32334, skip_nmi #32330, handled = 1, time = 1934704616
cpu #6, nmi #32335, skip_nmi #32336, handled = 2, time = 1936032320
cpu #6, nmi #32336, skip_nmi #32336, handled = 0, time = 1936034139
cpu #6, nmi #32337, skip_nmi #32336, handled = 1, time = 1936120100
cpu #6, nmi #32338, skip_nmi #32336, handled = 1, time = 1936404607
cpu #6, nmi #32339, skip_nmi #32336, handled = 1, time = 1937983416
cpu #6, nmi #32340, skip_nmi #32341, handled = 2, time = 1938201032
cpu #6, nmi #32341, skip_nmi #32341, handled = 0, time = 1938202830
cpu #6, nmi #32342, skip_nmi #32341, handled = 1, time = 1938443743
cpu #6, nmi #32343, skip_nmi #32341, handled = 1, time = 1939956552
cpu #6, nmi #32344, skip_nmi #32341, handled = 1, time = 1940073224
cpu #6, nmi #32345, skip_nmi #32341, handled = 1, time = 1940485677
cpu #6, nmi #32346, skip_nmi #32347, handled = 2, time = 1941947772
cpu #6, nmi #32347, skip_nmi #32347, handled = 1, time = 1941949818
cpu #6, nmi #32348, skip_nmi #32347, handled = 0, time = 1941951591
Uhhuh. NMI received for unknown reason 00 on CPU 6.
Do you have a strange power saving mode enabled?
Dazed and confused, but trying to continue
Deltas:
nmi #32334 340186
nmi #32335 1327704
nmi #32336 1819 <<<< back-to-back nmi [1]
nmi #32337 85961
nmi #32338 284507
nmi #32339 1578809
nmi #32340 217616
nmi #32341 1798 <<<< back-to-back nmi [2]
nmi #32342 240913
nmi #32343 1512809
nmi #32344 116672
nmi #32345 412453
nmi #32346 1462095 <<<< 1st nmi (standard) handling 2 counters
nmi #32347 2046 <<<< 2nd nmi (back-to-back) handling one counter
nmi #32348 1773 <<<< 3rd nmi (back-to-back) handling no counter! [3]
For back-to-back nmi detection there are the following rules:
The PMU nmi handler was handling more than one counter and no
counter was handled in the subsequent nmi (see [1] and [2] above).
There is another case if there are two subsequent back-to-back nmis
[3]. The 2nd is detected as back-to-back because the first handled
more than one counter. If the second handles one counter and the 3rd
handles nothing, we drop the 3rd nmi because it could be a
back-to-back nmi.
Signed-off-by: Robert Richter <robert.richter@amd.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Tested-by: Don Zickus <dzickus@redhat.com>
LKML-Reference: <20100817152225.GQ26154@erda.amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
arch/x86/kernel/cpu/perf_event.c | 59 +++++++++++++++++++++++++++++--------
1 files changed, 46 insertions(+), 13 deletions(-)
diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
index f2da20f..dd2fceb 100644
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -1154,7 +1154,7 @@ static int x86_pmu_handle_irq(struct pt_regs *regs)
/*
* event overflow
*/
- handled = 1;
+ handled++;
data.period = event->hw.last_period;
if (!x86_perf_event_set_period(event))
@@ -1200,12 +1200,20 @@ void perf_events_lapic_init(void)
apic_write(APIC_LVTPC, APIC_DM_NMI);
}
+struct pmu_nmi_state {
+ unsigned int marked;
+ int handled;
+};
+
+static DEFINE_PER_CPU(struct pmu_nmi_state, nmi);
+
static int __kprobes
perf_event_nmi_handler(struct notifier_block *self,
unsigned long cmd, void *__args)
{
struct die_args *args = __args;
- struct pt_regs *regs;
+ unsigned int this_nmi;
+ int handled;
if (!atomic_read(&active_events))
return NOTIFY_DONE;
@@ -1214,22 +1222,47 @@ perf_event_nmi_handler(struct notifier_block *self,
case DIE_NMI:
case DIE_NMI_IPI:
break;
-
+ case DIE_NMIUNKNOWN:
+ this_nmi = percpu_read(irq_stat.__nmi_count);
+ if (this_nmi != __get_cpu_var(nmi).marked)
+ /* let the kernel handle the unknown nmi */
+ return NOTIFY_DONE;
+ /*
+ * This one is a PMU back-to-back nmi. Two events
+ * trigger 'simultaneously' raising two back-to-back
+ * NMIs. If the first NMI handles both, the latter
+ * will be empty and daze the CPU. So, we drop it to
+ * avoid false-positive 'unknown nmi' messages.
+ */
+ return NOTIFY_STOP;
default:
return NOTIFY_DONE;
}
- regs = args->regs;
-
apic_write(APIC_LVTPC, APIC_DM_NMI);
- /*
- * Can't rely on the handled return value to say it was our NMI, two
- * events could trigger 'simultaneously' raising two back-to-back NMIs.
- *
- * If the first NMI handles both, the latter will be empty and daze
- * the CPU.
- */
- x86_pmu.handle_irq(regs);
+
+ handled = x86_pmu.handle_irq(args->regs);
+ if (!handled)
+ return NOTIFY_DONE;
+
+ this_nmi = percpu_read(irq_stat.__nmi_count);
+ if ((handled > 1) ||
+ /* the next nmi could be a back-to-back nmi */
+ ((__get_cpu_var(nmi).marked == this_nmi) &&
+ (__get_cpu_var(nmi).handled > 1))) {
+ /*
+ * We could have two subsequent back-to-back nmis: The
+ * first handles more than one counter, the 2nd
+ * handles only one counter and the 3rd handles no
+ * counter.
+ *
+ * This is the 2nd nmi because the previous was
+ * handling more than one counter. We will mark the
+ * next (3rd) and then drop it if unhandled.
+ */
+ __get_cpu_var(nmi).marked = this_nmi + 1;
+ __get_cpu_var(nmi).handled = handled;
+ }
return NOTIFY_STOP;
}
--
1.7.2.2
^ permalink raw reply related [flat|nested] 6+ messages in thread* [PATCH 2/4] perf, x86: Fix handle_irq return values
2010-09-01 2:56 [PATCH 0/4] nmi perf fixes Don Zickus
2010-09-01 2:56 ` [PATCH 1/4] perf, x86: Try to handle unknown nmis with an enabled PMU Don Zickus
@ 2010-09-01 2:56 ` Don Zickus
2010-09-01 2:56 ` [PATCH 3/4] [x86] perf: rename nmi variable to avoid clash with entry point Don Zickus
` (2 subsequent siblings)
4 siblings, 0 replies; 6+ messages in thread
From: Don Zickus @ 2010-09-01 2:56 UTC (permalink / raw)
To: mingo
Cc: peterz, robert.richter, gorcunov, fweisbec, linux-kernel,
ying.huang, ming.m.lin, yinghai, andi, Peter Zijlstra
From: Peter Zijlstra <a.p.zijlstra@chello.nl>
Now that we rely on the number of handled overflows, ensure all handle_irq
implementations actually return the right number.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Tested-by: Don Zickus <dzickus@redhat.com>
LKML-Reference: <1282228033.2605.204.camel@laptop>
---
arch/x86/kernel/cpu/perf_event_intel.c | 9 +++++++--
arch/x86/kernel/cpu/perf_event_p4.c | 2 +-
2 files changed, 8 insertions(+), 3 deletions(-)
diff --git a/arch/x86/kernel/cpu/perf_event_intel.c b/arch/x86/kernel/cpu/perf_event_intel.c
index d8d86d0..4539b4b 100644
--- a/arch/x86/kernel/cpu/perf_event_intel.c
+++ b/arch/x86/kernel/cpu/perf_event_intel.c
@@ -713,6 +713,7 @@ static int intel_pmu_handle_irq(struct pt_regs *regs)
struct cpu_hw_events *cpuc;
int bit, loops;
u64 ack, status;
+ int handled = 0;
perf_sample_data_init(&data, 0);
@@ -743,12 +744,16 @@ again:
/*
* PEBS overflow sets bit 62 in the global status register
*/
- if (__test_and_clear_bit(62, (unsigned long *)&status))
+ if (__test_and_clear_bit(62, (unsigned long *)&status)) {
+ handled++;
x86_pmu.drain_pebs(regs);
+ }
for_each_set_bit(bit, (unsigned long *)&status, X86_PMC_IDX_MAX) {
struct perf_event *event = cpuc->events[bit];
+ handled++;
+
if (!test_bit(bit, cpuc->active_mask))
continue;
@@ -772,7 +777,7 @@ again:
done:
intel_pmu_enable_all(0);
- return 1;
+ return handled;
}
static struct event_constraint *
diff --git a/arch/x86/kernel/cpu/perf_event_p4.c b/arch/x86/kernel/cpu/perf_event_p4.c
index febb12c..d470c91 100644
--- a/arch/x86/kernel/cpu/perf_event_p4.c
+++ b/arch/x86/kernel/cpu/perf_event_p4.c
@@ -690,7 +690,7 @@ static int p4_pmu_handle_irq(struct pt_regs *regs)
inc_irq_stat(apic_perf_irqs);
}
- return handled > 0;
+ return handled;
}
/*
--
1.7.2.2
^ permalink raw reply related [flat|nested] 6+ messages in thread* [PATCH 3/4] [x86] perf: rename nmi variable to avoid clash with entry point
2010-09-01 2:56 [PATCH 0/4] nmi perf fixes Don Zickus
2010-09-01 2:56 ` [PATCH 1/4] perf, x86: Try to handle unknown nmis with an enabled PMU Don Zickus
2010-09-01 2:56 ` [PATCH 2/4] perf, x86: Fix handle_irq return values Don Zickus
@ 2010-09-01 2:56 ` Don Zickus
2010-09-01 2:56 ` [PATCH 4/4] [x86] perf: fix accidentally ack'ing a second event on intel perf counter Don Zickus
2010-09-01 8:38 ` [PATCH 0/4] nmi perf fixes Robert Richter
4 siblings, 0 replies; 6+ messages in thread
From: Don Zickus @ 2010-09-01 2:56 UTC (permalink / raw)
To: mingo
Cc: peterz, robert.richter, gorcunov, fweisbec, linux-kernel,
ying.huang, ming.m.lin, yinghai, andi, Don Zickus
There is already an entry point named .nmi in entry.S and that seems to clash
with the per_cpu variable nmi defined in commit f3a860d8. Renaming this
variable avoids the namespace collision.
Signed-off-by: Don Zickus <dzickus@redhat.com>
---
arch/x86/kernel/cpu/perf_event.c | 12 ++++++------
1 files changed, 6 insertions(+), 6 deletions(-)
diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
index dd2fceb..2a05ea4 100644
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -1205,7 +1205,7 @@ struct pmu_nmi_state {
int handled;
};
-static DEFINE_PER_CPU(struct pmu_nmi_state, nmi);
+static DEFINE_PER_CPU_PAGE_ALIGNED(struct pmu_nmi_state, pmu_nmi);
static int __kprobes
perf_event_nmi_handler(struct notifier_block *self,
@@ -1224,7 +1224,7 @@ perf_event_nmi_handler(struct notifier_block *self,
break;
case DIE_NMIUNKNOWN:
this_nmi = percpu_read(irq_stat.__nmi_count);
- if (this_nmi != __get_cpu_var(nmi).marked)
+ if (this_nmi != __get_cpu_var(pmu_nmi).marked)
/* let the kernel handle the unknown nmi */
return NOTIFY_DONE;
/*
@@ -1248,8 +1248,8 @@ perf_event_nmi_handler(struct notifier_block *self,
this_nmi = percpu_read(irq_stat.__nmi_count);
if ((handled > 1) ||
/* the next nmi could be a back-to-back nmi */
- ((__get_cpu_var(nmi).marked == this_nmi) &&
- (__get_cpu_var(nmi).handled > 1))) {
+ ((__get_cpu_var(pmu_nmi).marked == this_nmi) &&
+ (__get_cpu_var(pmu_nmi).handled > 1))) {
/*
* We could have two subsequent back-to-back nmis: The
* first handles more than one counter, the 2nd
@@ -1260,8 +1260,8 @@ perf_event_nmi_handler(struct notifier_block *self,
* handling more than one counter. We will mark the
* next (3rd) and then drop it if unhandled.
*/
- __get_cpu_var(nmi).marked = this_nmi + 1;
- __get_cpu_var(nmi).handled = handled;
+ __get_cpu_var(pmu_nmi).marked = this_nmi + 1;
+ __get_cpu_var(pmu_nmi).handled = handled;
}
return NOTIFY_STOP;
--
1.7.2.2
^ permalink raw reply related [flat|nested] 6+ messages in thread* [PATCH 4/4] [x86] perf: fix accidentally ack'ing a second event on intel perf counter
2010-09-01 2:56 [PATCH 0/4] nmi perf fixes Don Zickus
` (2 preceding siblings ...)
2010-09-01 2:56 ` [PATCH 3/4] [x86] perf: rename nmi variable to avoid clash with entry point Don Zickus
@ 2010-09-01 2:56 ` Don Zickus
2010-09-01 8:38 ` [PATCH 0/4] nmi perf fixes Robert Richter
4 siblings, 0 replies; 6+ messages in thread
From: Don Zickus @ 2010-09-01 2:56 UTC (permalink / raw)
To: mingo
Cc: peterz, robert.richter, gorcunov, fweisbec, linux-kernel,
ying.huang, ming.m.lin, yinghai, andi, Don Zickus
During testing of a patch to stop having the perf subsytem swallow nmis,
it was uncovered that Nehalem boxes were randomly getting unknown nmis
when using the perf tool.
Moving the ack'ing of the PMI closer to when we get the status allows
the hardware to properly re-set the PMU bit signaling another PMI was
triggered during the processing of the first PMI. This allows the new
logic for dealing with the shortcomings of multiple PMIs to handle the
extra NMI by 'eat'ing it later.
Now one can wonder why are we getting a second PMI when we disable all
the PMUs in the begining of the NMI handler to prevent such a case, for
that I do not know. But I know the fix below helps deal with this quirk.
Tested on multiple Nehalems where the problem was occuring. With the
patch, the code now loops a second time to handle the second PMI (whereas
before it was not).
Signed-off-by: Don Zickus <dzickus@redhat.com>
---
arch/x86/kernel/cpu/perf_event_intel.c | 6 ++----
1 files changed, 2 insertions(+), 4 deletions(-)
diff --git a/arch/x86/kernel/cpu/perf_event_intel.c b/arch/x86/kernel/cpu/perf_event_intel.c
index 4539b4b..ee05c90 100644
--- a/arch/x86/kernel/cpu/perf_event_intel.c
+++ b/arch/x86/kernel/cpu/perf_event_intel.c
@@ -712,7 +712,7 @@ static int intel_pmu_handle_irq(struct pt_regs *regs)
struct perf_sample_data data;
struct cpu_hw_events *cpuc;
int bit, loops;
- u64 ack, status;
+ u64 status;
int handled = 0;
perf_sample_data_init(&data, 0);
@@ -729,6 +729,7 @@ static int intel_pmu_handle_irq(struct pt_regs *regs)
loops = 0;
again:
+ intel_pmu_ack_status(status);
if (++loops > 100) {
WARN_ONCE(1, "perfevents: irq loop stuck!\n");
perf_event_print_debug();
@@ -737,7 +738,6 @@ again:
}
inc_irq_stat(apic_perf_irqs);
- ack = status;
intel_pmu_lbr_read();
@@ -766,8 +766,6 @@ again:
x86_pmu_stop(event);
}
- intel_pmu_ack_status(ack);
-
/*
* Repeat if there is more work to be done:
*/
--
1.7.2.2
^ permalink raw reply related [flat|nested] 6+ messages in thread* Re: [PATCH 0/4] nmi perf fixes
2010-09-01 2:56 [PATCH 0/4] nmi perf fixes Don Zickus
` (3 preceding siblings ...)
2010-09-01 2:56 ` [PATCH 4/4] [x86] perf: fix accidentally ack'ing a second event on intel perf counter Don Zickus
@ 2010-09-01 8:38 ` Robert Richter
4 siblings, 0 replies; 6+ messages in thread
From: Robert Richter @ 2010-09-01 8:38 UTC (permalink / raw)
To: Don Zickus
Cc: mingo@elte.hu, peterz@infradead.org, gorcunov@gmail.com,
fweisbec@gmail.com, linux-kernel@vger.kernel.org,
ying.huang@intel.com, ming.m.lin@intel.com, yinghai@kernel.org,
andi@firstfloor.org
On 31.08.10 22:56:42, Don Zickus wrote:
> Fixes to allow unknown nmis to pass through the perf nmi handler instead
> of being swallowed. Contains patches that are already in Ingo's tree. Added
> here for completeness. Based on ingo/tip
>
> Tested on intel/amd
>
> git pull git://fedorapeople.org/dzickus/
>
> Don Zickus (2):
> [x86] perf: rename nmi variable to avoid clash with entry point
> [x86] perf: fix accidentally ack'ing a second event on intel perf
> counter
>
> Peter Zijlstra (1):
> perf, x86: Fix handle_irq return values
>
> Robert Richter (1):
> perf, x86: Try to handle unknown nmis with an enabled PMU
>
> arch/x86/kernel/cpu/perf_event.c | 59 +++++++++++++++++++++++++-------
> arch/x86/kernel/cpu/perf_event_intel.c | 15 +++++---
> arch/x86/kernel/cpu/perf_event_p4.c | 2 +-
> 3 files changed, 56 insertions(+), 20 deletions(-)
Reviewed and tested. Looks good to me, thanks Don.
-Robert
>
> --
> 1.7.2.2
>
>
--
Advanced Micro Devices, Inc.
Operating System Research Center
^ permalink raw reply [flat|nested] 6+ messages in thread