Re: perf failed with kernel 2.6.35-rc

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Peter Zijlstra <peterz@infradead.org>
To: "Zhang, Yanmin" <yanmin_zhang@linux.intel.com>
Cc: Ingo Molnar <mingo@elte.hu>, LKML <linux-kernel@vger.kernel.org>,
	Stephane Eranian <eranian@google.com>
Subject: Re: perf failed with kernel 2.6.35-rc
Date: Tue, 03 Aug 2010 14:20:43 +0200	[thread overview]
Message-ID: <1280838043.1923.587.camel@laptop> (raw)
In-Reply-To: <1279008849.2096.913.camel@ymzhang.sh.intel.com>

On Tue, 2010-07-13 at 16:14 +0800, Zhang, Yanmin wrote:

> If I comment out the workaround in function intel_pmu_nhm_enable_all,
> perf could work.
> 
> A quick glance shows:
> wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, 0x3);
> should be:
> wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, 0x7);

Correct, I've got a patch for that laying around,.. posted it several
times in the struct pmu rework series.

> I triggered sysrq to dump PMU registers and found the last bit of
> global status register is 1. I added a status reset operation like below patch:
> 
> --- linux-2.6.35-rc5/arch/x86/kernel/cpu/perf_event_intel.c	2010-07-14 09:38:11.000000000 +0800
> +++ linux-2.6.35-rc5_fork/arch/x86/kernel/cpu/perf_event_intel.c	2010-07-14 14:41:42.000000000 +0800
> @@ -505,8 +505,13 @@ static void intel_pmu_nhm_enable_all(int
>  		wrmsrl(MSR_ARCH_PERFMON_EVENTSEL0 + 1, 0x4300B1);
>  		wrmsrl(MSR_ARCH_PERFMON_EVENTSEL0 + 2, 0x4300B5);
>  
> -		wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, 0x3);
> +		wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, 0x7);
>  		wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, 0x0);
> +		/*
> +		 * Reset the last 3 bits of global status register in case
> +		 * previous enabling causes overflows.
> +		 */
> +		wrmsrl(MSR_CORE_PERF_GLOBAL_OVF_CTRL, 0x7);

Like Stephane already pointed out (and the comment above this function
mentions), these counters should be non-counting, and thus should not
cause an overflow (and this function gets called with GLOBAL_CTRL
cleared, so it shouldn't be counting to begin with).

The Nehalem-EX errata BA72 seems to agree (assuming Xeon-7500 is indeed
the EX, Intel really should do something about this naming madness)
http://www.intel.com/Assets/en_US/PDF/specupdate/323344.pdf

[ Also see the trace outout below, PERFCTR[12] stay 0 ]

>  		for (i = 0; i < 3; i++) {
>  			struct perf_event *event = cpuc->events[i];
> 
> 
> 
> However, it still doesn't work. Current right way is to comment out
> the workaround.

So I frobbed the below patchlet to debug things, as it appears I can
indeed reproduce on my westmere.

---
 arch/x86/kernel/cpu/perf_event.c       |   24 +++++++++++++++++++++++-
 arch/x86/kernel/cpu/perf_event_intel.c |    2 +-
 2 files changed, 24 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
index f2da20f..2ca41f7 100644
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -31,7 +31,7 @@
 #include <asm/nmi.h>
 #include <asm/compat.h>
 
-#if 0
+#if 1
 #undef wrmsrl
 #define wrmsrl(msr, val) 					\
 do {								\
@@ -42,6 +42,16 @@ do {								\
 } while (0)
 #endif
 
+#if 1
+#undef rdmsrl
+#define rdmsrl(msr, val)					\
+do {								\
+	(val) = native_read_msr(msr);				\
+	trace_printk("rdmsrl(%lx, %lx)\n", (unsigned long)(msr),\
+			(unsigned long)(val));			\
+} while (0)
+#endif
+
 /*
  * best effort, GUP based copy_from_user() that assumes IRQ or NMI context
  */
@@ -583,6 +593,14 @@ static void x86_pmu_disable_all(void)
 	}
 }
 
+static void noinline check_status(void)
+{
+	u64 status;
+	rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS, status);
+	if (WARN_ON(status))
+		perf_event_print_debug();
+}
+
 void hw_perf_disable(void)
 {
 	struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
@@ -598,6 +616,8 @@ void hw_perf_disable(void)
 	barrier();
 
 	x86_pmu.disable_all();
+
+	check_status();
 }
 
 static void x86_pmu_enable_all(int added)
@@ -816,6 +836,8 @@ void hw_perf_enable(void)
 	if (cpuc->enabled)
 		return;
 
+	check_status();
+
 	if (cpuc->n_added) {
 		int n_running = cpuc->n_events - cpuc->n_added;
 		/*
diff --git a/arch/x86/kernel/cpu/perf_event_intel.c b/arch/x86/kernel/cpu/perf_event_intel.c
index 214ac86..56ce7dc 100644
--- a/arch/x86/kernel/cpu/perf_event_intel.c
+++ b/arch/x86/kernel/cpu/perf_event_intel.c
@@ -505,7 +505,7 @@ static void intel_pmu_nhm_enable_all(int added)
 		wrmsrl(MSR_ARCH_PERFMON_EVENTSEL0 + 1, 0x4300B1);
 		wrmsrl(MSR_ARCH_PERFMON_EVENTSEL0 + 2, 0x4300B5);
 
-		wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, 0x3);
+		wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, 0x7);
 		wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, 0x0);
 
 		for (i = 0; i < 3; i++) {
---

It results in the below trace (edited out some superfluous stack traces:

           <...>-1871  [000]    96.759853: intel_pmu_disable_all: wrmsrl(38f, 0)
           <...>-1871  [000]    96.759853: <stack trace>
 => hw_perf_disable
 => perf_disable
 => perf_ctx_adjust_freq
 => perf_event_task_tick
 => scheduler_tick
 => update_process_times
 => tick_sched_timer
 => __run_hrtimer
           <...>-1871  [000]    96.759872: check_status: rdmsrl(38e, 1)
           <...>-1871  [000]    96.759872: <stack trace>
 => hw_perf_disable
 => perf_disable
 => perf_ctx_adjust_freq
 => perf_event_task_tick
 => scheduler_tick
 => update_process_times
 => tick_sched_timer
 => __run_hrtimer
           <...>-1871  [000]    96.760180: perf_event_print_debug: rdmsrl(38f, 0)
           <...>-1871  [000]    96.760186: perf_event_print_debug: rdmsrl(38e, 1)
           <...>-1871  [000]    96.760191: perf_event_print_debug: rdmsrl(390, 0)
           <...>-1871  [000]    96.760196: perf_event_print_debug: rdmsrl(38d, 0)
           <...>-1871  [000]    96.760201: perf_event_print_debug: rdmsrl(3f1, 0)
           <...>-1871  [000]    96.760248: perf_event_print_debug: rdmsrl(186, 53003c)
           <...>-1871  [000]    96.760258: perf_event_print_debug: rdmsrl(c1, d1e91)
           <...>-1871  [000]    96.760285: perf_event_print_debug: rdmsrl(187, 4300b1)
           <...>-1871  [000]    96.760290: perf_event_print_debug: rdmsrl(c2, 0)
           <...>-1871  [000]    96.760320: perf_event_print_debug: rdmsrl(188, 4300b5)
           <...>-1871  [000]    96.760340: perf_event_print_debug: rdmsrl(c3, 0)
           <...>-1871  [000]    96.760359: perf_event_print_debug: rdmsrl(189, 0)
           <...>-1871  [000]    96.760379: perf_event_print_debug: rdmsrl(c4, 0)
           <...>-1871  [000]    96.760398: perf_event_print_debug: rdmsrl(309, 0)
           <...>-1871  [000]    96.760417: perf_event_print_debug: rdmsrl(30a, 0)
           <...>-1871  [000]    96.760432: perf_event_print_debug: rdmsrl(30b, 0)
           <...>-1871  [000]    96.760432: <stack trace>
 => check_status
 => hw_perf_disable
 => perf_disable
 => perf_ctx_adjust_freq
 => perf_event_task_tick
 => scheduler_tick
 => update_process_times
 => tick_sched_timer

Which seems to indicate we somehow disable the PMU (clear GLOBAL_CTRL)
with a pending overflow (as indicated by GLOBAL_STATUS and PERFCTR0).

So there's two questions:
 1) how can we end up in the above state
 2) why does the fixup change things

most confusing..

next prev parent reply	other threads:[~2010-08-03 12:20 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-07-13  8:14 perf failed with kernel 2.6.35-rc Zhang, Yanmin
2010-07-13  8:50 ` Ingo Molnar
2010-07-14  0:49   ` Zhang, Yanmin
2010-07-14  8:15     ` Zhang, Yanmin
2010-07-13 15:16 ` Stephane Eranian
2010-07-14  0:13   ` Zhang, Yanmin
2010-07-14  0:36     ` Stephane Eranian
2010-07-14  2:22       ` Zhang, Yanmin
2010-08-03 12:20 ` Peter Zijlstra [this message]
2010-08-04 15:48   ` Stephane Eranian
     [not found]   ` <1280886349.2125.32.camel@ymzhang.sh.intel.com>
     [not found]     ` <1280905701.1923.717.camel@laptop>
     [not found]       ` <1280990413.2125.50.camel@ymzhang.sh.intel.com>
     [not found]         ` <1281004361.1923.1750.camel@laptop>
2010-08-06  5:39           ` Zhang, Yanmin
2010-08-18 10:27             ` [tip:perf/urgent] perf, x86: Fix Intel-nhm PMU programming errata workaround tip-bot for Zhang, Yanmin

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:f2da20f dfblob:2ca41f7 dfblob:214ac86 dfblob:56ce7dc )
 OR (
bs:"Re: perf failed with kernel 2.6.35-rc" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1280838043.1923.587.camel@laptop \
    --to=peterz@infradead.org \
    --cc=eranian@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=yanmin_zhang@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.