public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Peter Zijlstra <a.p.zijlstra@chello.nl>
To: eranian@gmail.com
Cc: Ingo Molnar <mingo@elte.hu>, LKML <linux-kernel@vger.kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Robert Richter <robert.richter@amd.com>,
	Paul Mackerras <paulus@samba.org>,
	Andi Kleen <andi@firstfloor.org>,
	Maynard Johnson <mpjohn@us.ibm.com>, Carl Love <cel@us.ibm.com>,
	Corey J Ashford <cjashfor@us.ibm.com>,
	Philip Mucci <mucci@eecs.utk.edu>,
	Dan Terpstra <terpstra@eecs.utk.edu>,
	perfmon2-devel <perfmon2-devel@lists.sourceforge.net>
Subject: Re: I.5 - Mmaped count
Date: Mon, 22 Jun 2009 16:39:32 +0200	[thread overview]
Message-ID: <1245681572.19816.481.camel@twins> (raw)
In-Reply-To: <7c86c4470906220554m1da6ebe7h50444db6ab606aa1@mail.gmail.com>

On Mon, 2009-06-22 at 14:54 +0200, stephane eranian wrote:
> On Mon, Jun 22, 2009 at 2:35 PM, Peter Zijlstra<a.p.zijlstra@chello.nl> wrote:
> > On Mon, 2009-06-22 at 14:25 +0200, stephane eranian wrote:
> >> On Mon, Jun 22, 2009 at 1:52 PM, Ingo Molnar<mingo@elte.hu> wrote:
> >> >> 5/ Mmaped count
> >> >>
> >> >> It is possible to read counts directly from user space for
> >> >> self-monitoring threads. This leverages a HW capability present on
> >> >> some processors. On X86, this is possible via RDPMC.
> >> >>
> >> >> The full 64-bit count is constructed by combining the hardware
> >> >> value extracted with an assembly instruction and a base value made
> >> >> available thru the mmap. There is an atomic generation count
> >> >> available to deal with the race condition.
> >> >>
> >> >> I believe there is a problem with this approach given that the PMU
> >> >> is shared and that events can be multiplexed. That means that even
> >> >> though you are self-monitoring, events get replaced on the PMU.
> >> >> The assembly instruction is unaware of that, it reads a register
> >> >> not an event.
> >> >>
> >> >> On x86, assume event A is hosted in counter 0, thus you need
> >> >> RDPMC(0) to extract the count. But then, the event is replaced by
> >> >> another one which reuses counter 0. At the user level, you will
> >> >> still use RDPMC(0) but it will read the HW value from a different
> >> >> event and combine it with a base count from another one.
> >> >>
> >> >> To avoid this, you need to pin the event so it stays in the PMU at
> >> >> all times. Now, here is something unclear to me. Pinning does not
> >> >> mean stay in the SAME register, it means the event stays on the
> >> >> PMU but it can possibly change register. To prevent that, I
> >> >> believe you need to also set exclusive so that no other group can
> >> >> be scheduled, and thus possibly use the same counter.
> >> >>
> >> >> Looks like this is the only way you can make this actually work.
> >> >> Not setting pinned+exclusive, is another pitfall in which many
> >> >> people will fall into.
> >> >
> >> >   do {
> >> >     seq = pc->lock;
> >> >
> >> >     barrier()
> >> >     if (pc->index) {
> >> >       count = pmc_read(pc->index - 1);
> >> >       count += pc->offset;
> >> >     } else
> >> >       goto regular_read;
> >> >
> >> >     barrier();
> >> >   } while (pc->lock != seq);
> >> >
> >> > We don't see the hole you are referring to. The sequence lock
> >> > ensures you get a consistent view.
> >> >
> >> Let's take an example, with two groups, one event in each group.
> >> Both events scheduled on counter0, i.e,, rdpmc(0). The 2 groups
> >> are multiplexed, one each tick. The user gets 2 file descriptors
> >> and thus two mmap'ed pages.
> >>
> >> Suppose the user wants to read, using the above loop, the value of the
> >> event in the first group BUT it's the 2nd group  that is currently active
> >> and loaded on counter0, i.e., rdpmc(0) returns the value of the 2nd event.
> >>
> >> Unless you tell me that pc->index is marked invalid (0) when the
> >> event is not scheduled. I don't see how you can avoid reading
> >> the wrong value. I am assuming that is the event is not scheduled
> >> lock remains constant.
> >
> > Indeed, pc->index == 0 means its not currently available.
> 
> I don't see where you clear that field on x86.

x86 doesn't have this feature fully implemented yet.. its still on the
todo list. Paulus started this on power, so it should work there.

> Looks like it comes from hwc->idx. I suspect you need
> to do something in x86_pmu_disable() to be symmetrical
> with x86_pmu_enable().

Right.

> I suspect something similar needs to be done on Power.

It looks like the power disable method does indeed do this:

	if (counter->hw.idx) {
		write_pmc(counter->hw.idx, 0);
		counter->hw.idx = 0;
	}
	perf_counter_update_userpage(counter);


The below might suffice for x86.. but its not real nice.
Power already has that whole +1 thing in its ->idx field, x86 does not.
So I either munge x86 or add something like I did below..

Paul, any suggestions?

---
Index: linux-2.6/arch/x86/kernel/cpu/perf_counter.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/cpu/perf_counter.c
+++ linux-2.6/arch/x86/kernel/cpu/perf_counter.c
@@ -912,6 +912,8 @@ x86_perf_counter_set_period(struct perf_
 	err = checking_wrmsrl(hwc->counter_base + idx,
 			     (u64)(-left) & x86_pmu.counter_mask);
 
+	perf_counter_update_userpage(counter);
+
 	return ret;
 }
 
@@ -1041,6 +1043,8 @@ try_generic:
 	x86_perf_counter_set_period(counter, hwc, idx);
 	x86_pmu.enable(hwc, idx);
 
+	perf_counter_update_userpage(counter);
+
 	return 0;
 }
 
@@ -1133,6 +1137,8 @@ static void x86_pmu_disable(struct perf_
 	x86_perf_counter_update(counter, hwc, idx);
 	cpuc->counters[idx] = NULL;
 	clear_bit(idx, cpuc->used_mask);
+
+	perf_counter_update_userpage(counter);
 }
 
 /*
Index: linux-2.6/kernel/perf_counter.c
===================================================================
--- linux-2.6.orig/kernel/perf_counter.c
+++ linux-2.6/kernel/perf_counter.c
@@ -1753,6 +1753,14 @@ int perf_counter_task_disable(void)
 	return 0;
 }
 
+static int perf_counter_index(struct perf_counter *counter)
+{
+	if (counter->state != PERF_COUNTER_STATE_ACTIVE)
+		return 0;
+
+	return counter->hw->idx + 1 - PERF_COUNTER_INDEX_OFFSET;
+}
+
 /*
  * Callers need to ensure there can be no nesting of this function, otherwise
  * the seqlock logic goes bad. We can not serialize this because the arch
@@ -1777,7 +1785,7 @@ void perf_counter_update_userpage(struct
 	preempt_disable();
 	++userpg->lock;
 	barrier();
-	userpg->index = counter->hw.idx;
+	userpg->index = perf_counter_index(counter);
 	userpg->offset = atomic64_read(&counter->count);
 	if (counter->state == PERF_COUNTER_STATE_ACTIVE)
 		userpg->offset -= atomic64_read(&counter->hw.prev_count);
Index: linux-2.6/arch/powerpc/include/asm/perf_counter.h
===================================================================
--- linux-2.6.orig/arch/powerpc/include/asm/perf_counter.h
+++ linux-2.6/arch/powerpc/include/asm/perf_counter.h
@@ -61,6 +61,8 @@ struct pt_regs;
 extern unsigned long perf_misc_flags(struct pt_regs *regs);
 extern unsigned long perf_instruction_pointer(struct pt_regs *regs);
 
+#define PERF_COUNTER_INDEX_OFFSET	1
+
 /*
  * Only override the default definitions in include/linux/perf_counter.h
  * if we have hardware PMU support.
Index: linux-2.6/arch/x86/include/asm/perf_counter.h
===================================================================
--- linux-2.6.orig/arch/x86/include/asm/perf_counter.h
+++ linux-2.6/arch/x86/include/asm/perf_counter.h
@@ -87,6 +87,9 @@ union cpuid10_edx {
 #ifdef CONFIG_PERF_COUNTERS
 extern void init_hw_perf_counters(void);
 extern void perf_counters_lapic_init(void);
+
+#define PERF_COUNTER_INDEX_OFFSET			0
+
 #else
 static inline void init_hw_perf_counters(void)		{ }
 static inline void perf_counters_lapic_init(void)	{ }



  reply	other threads:[~2009-06-22 14:39 UTC|newest]

Thread overview: 68+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-06-16 17:42 v2 of comments on Performance Counters for Linux (PCL) stephane eranian
2009-06-22 11:48 ` Ingo Molnar
2009-06-22 11:49 ` I.1 - System calls - ioctl Ingo Molnar
2009-06-22 12:58   ` Christoph Hellwig
2009-06-22 13:56     ` Ingo Molnar
2009-06-22 17:41       ` Arnd Bergmann
2009-07-13 10:53     ` Peter Zijlstra
2009-07-13 17:30       ` [perfmon2] " Arnd Bergmann
2009-07-13 17:34         ` Peter Zijlstra
2009-07-13 17:53           ` Arnd Bergmann
2009-07-14 13:51       ` Christoph Hellwig
2009-07-30 13:58       ` stephane eranian
2009-07-30 14:13         ` Peter Zijlstra
2009-07-30 16:17           ` stephane eranian
2009-07-30 16:40             ` Arnd Bergmann
2009-07-30 16:53               ` stephane eranian
2009-07-30 17:20                 ` Arnd Bergmann
2009-08-03 14:22                   ` Peter Zijlstra
2009-06-22 11:50 ` I.2 - Grouping Ingo Molnar
2009-06-22 19:45   ` stephane eranian
2009-06-22 22:04     ` Corey Ashford
2009-06-23 17:51       ` stephane eranian
2009-06-22 21:38   ` Corey Ashford
2009-06-23  5:16   ` Paul Mackerras
2009-06-23  7:36     ` stephane eranian
2009-06-23  8:26       ` Paul Mackerras
2009-06-23  8:30         ` stephane eranian
2009-06-23 16:24           ` Corey Ashford
2009-06-22 11:51 ` I.3 - Multiplexing and system-wide Ingo Molnar
2009-06-22 11:51 ` I.4 - Controlling group multiplexing Ingo Molnar
2009-06-22 11:52 ` I.5 - Mmaped count Ingo Molnar
2009-06-22 12:25   ` stephane eranian
2009-06-22 12:35     ` Peter Zijlstra
2009-06-22 12:54       ` stephane eranian
2009-06-22 14:39         ` Peter Zijlstra [this message]
2009-06-23  0:41         ` Paul Mackerras
2009-06-23  0:39       ` Paul Mackerras
2009-06-23  6:13         ` Peter Zijlstra
2009-06-23  7:40         ` stephane eranian
2009-06-23  0:33     ` Paul Mackerras
2009-06-22 11:53 ` I.6 - Group scheduling Ingo Molnar
2009-06-22 11:54 ` I.7 - Group validity checking Ingo Molnar
2009-06-22 11:54 ` I.8 - Generalized cache events Ingo Molnar
2009-06-22 11:55 ` I.9 - Group reading Ingo Molnar
2009-06-22 11:55 ` I.10 - Event buffer minimal useful size Ingo Molnar
2009-06-22 11:56 ` I.11 - Missing definitions for generic events Ingo Molnar
2009-06-22 14:54   ` stephane eranian
2009-06-22 11:57 ` II.1 - Fixed counters on Intel Ingo Molnar
2009-06-22 14:27   ` stephane eranian
2009-06-22 11:57 ` II.2 - Event knowledge missing Ingo Molnar
2009-06-23 13:18   ` stephane eranian
2009-06-22 11:58 ` III.1 - Sampling period randomization Ingo Molnar
2009-06-22 11:58 ` IV.1 - Support for model-specific uncore PMU Ingo Molnar
2009-06-22 11:59 ` IV.2 - Features impacting all counters Ingo Molnar
2009-06-22 12:00 ` IV.3 - AMD IBS Ingo Molnar
2009-06-22 14:08   ` [perfmon2] " Rob Fowler
2009-06-22 17:58     ` Maynard Johnson
2009-06-23  6:19     ` Peter Zijlstra
2009-06-23  8:19       ` stephane eranian
2009-06-23 14:05         ` Ingo Molnar
2009-06-23 14:25           ` stephane eranian
2009-06-23 14:55             ` Ingo Molnar
2009-06-23 14:40       ` Rob Fowler
2009-06-22 19:17   ` stephane eranian
2009-06-22 12:00 ` IV.4 - Intel PEBS Ingo Molnar
2009-06-22 12:16   ` Andi Kleen
2009-06-22 12:01 ` IV.5 - Intel Last Branch Record (LBR) Ingo Molnar
2009-06-22 20:02   ` stephane eranian

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1245681572.19816.481.camel@twins \
    --to=a.p.zijlstra@chello.nl \
    --cc=akpm@linux-foundation.org \
    --cc=andi@firstfloor.org \
    --cc=cel@us.ibm.com \
    --cc=cjashfor@us.ibm.com \
    --cc=eranian@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=mpjohn@us.ibm.com \
    --cc=mucci@eecs.utk.edu \
    --cc=paulus@samba.org \
    --cc=perfmon2-devel@lists.sourceforge.net \
    --cc=robert.richter@amd.com \
    --cc=terpstra@eecs.utk.edu \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox