public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Peter Zijlstra <peterz@infradead.org>
To: "Zhang, Yanmin" <yanmin_zhang@linux.intel.com>
Cc: Ingo Molnar <mingo@elte.hu>, LKML <linux-kernel@vger.kernel.org>,
	Stephane Eranian <eranian@google.com>
Subject: Re: perf failed with kernel 2.6.35-rc
Date: Tue, 03 Aug 2010 14:20:43 +0200	[thread overview]
Message-ID: <1280838043.1923.587.camel@laptop> (raw)
In-Reply-To: <1279008849.2096.913.camel@ymzhang.sh.intel.com>

On Tue, 2010-07-13 at 16:14 +0800, Zhang, Yanmin wrote:

> If I comment out the workaround in function intel_pmu_nhm_enable_all,
> perf could work.
> 
> A quick glance shows:
> wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, 0x3);
> should be:
> wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, 0x7);

Correct, I've got a patch for that laying around,.. posted it several
times in the struct pmu rework series.

> I triggered sysrq to dump PMU registers and found the last bit of
> global status register is 1. I added a status reset operation like below patch:
> 
> --- linux-2.6.35-rc5/arch/x86/kernel/cpu/perf_event_intel.c	2010-07-14 09:38:11.000000000 +0800
> +++ linux-2.6.35-rc5_fork/arch/x86/kernel/cpu/perf_event_intel.c	2010-07-14 14:41:42.000000000 +0800
> @@ -505,8 +505,13 @@ static void intel_pmu_nhm_enable_all(int
>  		wrmsrl(MSR_ARCH_PERFMON_EVENTSEL0 + 1, 0x4300B1);
>  		wrmsrl(MSR_ARCH_PERFMON_EVENTSEL0 + 2, 0x4300B5);
>  
> -		wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, 0x3);
> +		wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, 0x7);
>  		wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, 0x0);
> +		/*
> +		 * Reset the last 3 bits of global status register in case
> +		 * previous enabling causes overflows.
> +		 */
> +		wrmsrl(MSR_CORE_PERF_GLOBAL_OVF_CTRL, 0x7);

Like Stephane already pointed out (and the comment above this function
mentions), these counters should be non-counting, and thus should not
cause an overflow (and this function gets called with GLOBAL_CTRL
cleared, so it shouldn't be counting to begin with).

The Nehalem-EX errata BA72 seems to agree (assuming Xeon-7500 is indeed
the EX, Intel really should do something about this naming madness)
http://www.intel.com/Assets/en_US/PDF/specupdate/323344.pdf

[ Also see the trace outout below, PERFCTR[12] stay 0 ]

>  		for (i = 0; i < 3; i++) {
>  			struct perf_event *event = cpuc->events[i];
> 
> 
> 
> However, it still doesn't work. Current right way is to comment out
> the workaround.

So I frobbed the below patchlet to debug things, as it appears I can
indeed reproduce on my westmere.

---
 arch/x86/kernel/cpu/perf_event.c       |   24 +++++++++++++++++++++++-
 arch/x86/kernel/cpu/perf_event_intel.c |    2 +-
 2 files changed, 24 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
index f2da20f..2ca41f7 100644
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -31,7 +31,7 @@
 #include <asm/nmi.h>
 #include <asm/compat.h>
 
-#if 0
+#if 1
 #undef wrmsrl
 #define wrmsrl(msr, val) 					\
 do {								\
@@ -42,6 +42,16 @@ do {								\
 } while (0)
 #endif
 
+#if 1
+#undef rdmsrl
+#define rdmsrl(msr, val)					\
+do {								\
+	(val) = native_read_msr(msr);				\
+	trace_printk("rdmsrl(%lx, %lx)\n", (unsigned long)(msr),\
+			(unsigned long)(val));			\
+} while (0)
+#endif
+
 /*
  * best effort, GUP based copy_from_user() that assumes IRQ or NMI context
  */
@@ -583,6 +593,14 @@ static void x86_pmu_disable_all(void)
 	}
 }
 
+static void noinline check_status(void)
+{
+	u64 status;
+	rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS, status);
+	if (WARN_ON(status))
+		perf_event_print_debug();
+}
+
 void hw_perf_disable(void)
 {
 	struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
@@ -598,6 +616,8 @@ void hw_perf_disable(void)
 	barrier();
 
 	x86_pmu.disable_all();
+
+	check_status();
 }
 
 static void x86_pmu_enable_all(int added)
@@ -816,6 +836,8 @@ void hw_perf_enable(void)
 	if (cpuc->enabled)
 		return;
 
+	check_status();
+
 	if (cpuc->n_added) {
 		int n_running = cpuc->n_events - cpuc->n_added;
 		/*
diff --git a/arch/x86/kernel/cpu/perf_event_intel.c b/arch/x86/kernel/cpu/perf_event_intel.c
index 214ac86..56ce7dc 100644
--- a/arch/x86/kernel/cpu/perf_event_intel.c
+++ b/arch/x86/kernel/cpu/perf_event_intel.c
@@ -505,7 +505,7 @@ static void intel_pmu_nhm_enable_all(int added)
 		wrmsrl(MSR_ARCH_PERFMON_EVENTSEL0 + 1, 0x4300B1);
 		wrmsrl(MSR_ARCH_PERFMON_EVENTSEL0 + 2, 0x4300B5);
 
-		wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, 0x3);
+		wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, 0x7);
 		wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, 0x0);
 
 		for (i = 0; i < 3; i++) {
---

It results in the below trace (edited out some superfluous stack traces:

           <...>-1871  [000]    96.759853: intel_pmu_disable_all: wrmsrl(38f, 0)
           <...>-1871  [000]    96.759853: <stack trace>
 => hw_perf_disable
 => perf_disable
 => perf_ctx_adjust_freq
 => perf_event_task_tick
 => scheduler_tick
 => update_process_times
 => tick_sched_timer
 => __run_hrtimer
           <...>-1871  [000]    96.759872: check_status: rdmsrl(38e, 1)
           <...>-1871  [000]    96.759872: <stack trace>
 => hw_perf_disable
 => perf_disable
 => perf_ctx_adjust_freq
 => perf_event_task_tick
 => scheduler_tick
 => update_process_times
 => tick_sched_timer
 => __run_hrtimer
           <...>-1871  [000]    96.760180: perf_event_print_debug: rdmsrl(38f, 0)
           <...>-1871  [000]    96.760186: perf_event_print_debug: rdmsrl(38e, 1)
           <...>-1871  [000]    96.760191: perf_event_print_debug: rdmsrl(390, 0)
           <...>-1871  [000]    96.760196: perf_event_print_debug: rdmsrl(38d, 0)
           <...>-1871  [000]    96.760201: perf_event_print_debug: rdmsrl(3f1, 0)
           <...>-1871  [000]    96.760248: perf_event_print_debug: rdmsrl(186, 53003c)
           <...>-1871  [000]    96.760258: perf_event_print_debug: rdmsrl(c1, d1e91)
           <...>-1871  [000]    96.760285: perf_event_print_debug: rdmsrl(187, 4300b1)
           <...>-1871  [000]    96.760290: perf_event_print_debug: rdmsrl(c2, 0)
           <...>-1871  [000]    96.760320: perf_event_print_debug: rdmsrl(188, 4300b5)
           <...>-1871  [000]    96.760340: perf_event_print_debug: rdmsrl(c3, 0)
           <...>-1871  [000]    96.760359: perf_event_print_debug: rdmsrl(189, 0)
           <...>-1871  [000]    96.760379: perf_event_print_debug: rdmsrl(c4, 0)
           <...>-1871  [000]    96.760398: perf_event_print_debug: rdmsrl(309, 0)
           <...>-1871  [000]    96.760417: perf_event_print_debug: rdmsrl(30a, 0)
           <...>-1871  [000]    96.760432: perf_event_print_debug: rdmsrl(30b, 0)
           <...>-1871  [000]    96.760432: <stack trace>
 => check_status
 => hw_perf_disable
 => perf_disable
 => perf_ctx_adjust_freq
 => perf_event_task_tick
 => scheduler_tick
 => update_process_times
 => tick_sched_timer

Which seems to indicate we somehow disable the PMU (clear GLOBAL_CTRL)
with a pending overflow (as indicated by GLOBAL_STATUS and PERFCTR0).

So there's two questions:
 1) how can we end up in the above state
 2) why does the fixup change things

most confusing..

  parent reply	other threads:[~2010-08-03 12:20 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-07-13  8:14 perf failed with kernel 2.6.35-rc Zhang, Yanmin
2010-07-13  8:50 ` Ingo Molnar
2010-07-14  0:49   ` Zhang, Yanmin
2010-07-14  8:15     ` Zhang, Yanmin
2010-07-13 15:16 ` Stephane Eranian
2010-07-14  0:13   ` Zhang, Yanmin
2010-07-14  0:36     ` Stephane Eranian
2010-07-14  2:22       ` Zhang, Yanmin
2010-08-03 12:20 ` Peter Zijlstra [this message]
2010-08-04 15:48   ` Stephane Eranian
     [not found]   ` <1280886349.2125.32.camel@ymzhang.sh.intel.com>
     [not found]     ` <1280905701.1923.717.camel@laptop>
     [not found]       ` <1280990413.2125.50.camel@ymzhang.sh.intel.com>
     [not found]         ` <1281004361.1923.1750.camel@laptop>
2010-08-06  5:39           ` Zhang, Yanmin
2010-08-18 10:27             ` [tip:perf/urgent] perf, x86: Fix Intel-nhm PMU programming errata workaround tip-bot for Zhang, Yanmin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1280838043.1923.587.camel@laptop \
    --to=peterz@infradead.org \
    --cc=eranian@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=yanmin_zhang@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox