From: Peter Zijlstra <a.p.zijlstra@chello.nl>
To: eranian@gmail.com
Cc: Ingo Molnar <mingo@elte.hu>, LKML <linux-kernel@vger.kernel.org>,
Andrew Morton <akpm@linux-foundation.org>,
Thomas Gleixner <tglx@linutronix.de>,
Robert Richter <robert.richter@amd.com>,
Paul Mackerras <paulus@samba.org>,
Andi Kleen <andi@firstfloor.org>,
Maynard Johnson <mpjohn@us.ibm.com>, Carl Love <cel@us.ibm.com>,
Corey J Ashford <cjashfor@us.ibm.com>,
Philip Mucci <mucci@eecs.utk.edu>,
Dan Terpstra <terpstra@eecs.utk.edu>,
perfmon2-devel <perfmon2-devel@lists.sourceforge.net>
Subject: Re: I.5 - Mmaped count
Date: Mon, 22 Jun 2009 14:35:54 +0200 [thread overview]
Message-ID: <1245674154.19816.228.camel@twins> (raw)
In-Reply-To: <7c86c4470906220525x409bedadj29be01236e42ea1@mail.gmail.com>
On Mon, 2009-06-22 at 14:25 +0200, stephane eranian wrote:
> On Mon, Jun 22, 2009 at 1:52 PM, Ingo Molnar<mingo@elte.hu> wrote:
> >> 5/ Mmaped count
> >>
> >> It is possible to read counts directly from user space for
> >> self-monitoring threads. This leverages a HW capability present on
> >> some processors. On X86, this is possible via RDPMC.
> >>
> >> The full 64-bit count is constructed by combining the hardware
> >> value extracted with an assembly instruction and a base value made
> >> available thru the mmap. There is an atomic generation count
> >> available to deal with the race condition.
> >>
> >> I believe there is a problem with this approach given that the PMU
> >> is shared and that events can be multiplexed. That means that even
> >> though you are self-monitoring, events get replaced on the PMU.
> >> The assembly instruction is unaware of that, it reads a register
> >> not an event.
> >>
> >> On x86, assume event A is hosted in counter 0, thus you need
> >> RDPMC(0) to extract the count. But then, the event is replaced by
> >> another one which reuses counter 0. At the user level, you will
> >> still use RDPMC(0) but it will read the HW value from a different
> >> event and combine it with a base count from another one.
> >>
> >> To avoid this, you need to pin the event so it stays in the PMU at
> >> all times. Now, here is something unclear to me. Pinning does not
> >> mean stay in the SAME register, it means the event stays on the
> >> PMU but it can possibly change register. To prevent that, I
> >> believe you need to also set exclusive so that no other group can
> >> be scheduled, and thus possibly use the same counter.
> >>
> >> Looks like this is the only way you can make this actually work.
> >> Not setting pinned+exclusive, is another pitfall in which many
> >> people will fall into.
> >
> > do {
> > seq = pc->lock;
> >
> > barrier()
> > if (pc->index) {
> > count = pmc_read(pc->index - 1);
> > count += pc->offset;
> > } else
> > goto regular_read;
> >
> > barrier();
> > } while (pc->lock != seq);
> >
> > We don't see the hole you are referring to. The sequence lock
> > ensures you get a consistent view.
> >
> Let's take an example, with two groups, one event in each group.
> Both events scheduled on counter0, i.e,, rdpmc(0). The 2 groups
> are multiplexed, one each tick. The user gets 2 file descriptors
> and thus two mmap'ed pages.
>
> Suppose the user wants to read, using the above loop, the value of the
> event in the first group BUT it's the 2nd group that is currently active
> and loaded on counter0, i.e., rdpmc(0) returns the value of the 2nd event.
>
> Unless you tell me that pc->index is marked invalid (0) when the
> event is not scheduled. I don't see how you can avoid reading
> the wrong value. I am assuming that is the event is not scheduled
> lock remains constant.
Indeed, pc->index == 0 means its not currently available.
> Assuming the event is active when you enter the loop and you
> read a value. How to get the timing information to scale the
> count?
I think we would have to add that do the data page,.. something like the
below?
Paulus?
---
Index: linux-2.6/include/linux/perf_counter.h
===================================================================
--- linux-2.6.orig/include/linux/perf_counter.h
+++ linux-2.6/include/linux/perf_counter.h
@@ -232,6 +232,10 @@ struct perf_counter_mmap_page {
__u32 lock; /* seqlock for synchronization */
__u32 index; /* hardware counter identifier */
__s64 offset; /* add to hardware counter value */
+ __u64 total_time; /* total time counter active */
+ __u64 running_time; /* time counter on cpu */
+
+ __u64 __reserved[123]; /* align at 1k */
/*
* Control data for the mmap() data buffer.
Index: linux-2.6/kernel/perf_counter.c
===================================================================
--- linux-2.6.orig/kernel/perf_counter.c
+++ linux-2.6/kernel/perf_counter.c
@@ -1782,6 +1782,12 @@ void perf_counter_update_userpage(struct
if (counter->state == PERF_COUNTER_STATE_ACTIVE)
userpg->offset -= atomic64_read(&counter->hw.prev_count);
+ userpg->total_time = counter->total_time_enabled +
+ atomic64_read(&counter->child_total_time_enabled);
+
+ userpg->running_time = counter->total_time_running +
+ atomic64_read(&counter->child_total_time_running);
+
barrier();
++userpg->lock;
preempt_enable();
next prev parent reply other threads:[~2009-06-22 12:35 UTC|newest]
Thread overview: 68+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-06-16 17:42 v2 of comments on Performance Counters for Linux (PCL) stephane eranian
2009-06-22 11:48 ` Ingo Molnar
2009-06-22 11:49 ` I.1 - System calls - ioctl Ingo Molnar
2009-06-22 12:58 ` Christoph Hellwig
2009-06-22 13:56 ` Ingo Molnar
2009-06-22 17:41 ` Arnd Bergmann
2009-07-13 10:53 ` Peter Zijlstra
2009-07-13 17:30 ` [perfmon2] " Arnd Bergmann
2009-07-13 17:34 ` Peter Zijlstra
2009-07-13 17:53 ` Arnd Bergmann
2009-07-14 13:51 ` Christoph Hellwig
2009-07-30 13:58 ` stephane eranian
2009-07-30 14:13 ` Peter Zijlstra
2009-07-30 16:17 ` stephane eranian
2009-07-30 16:40 ` Arnd Bergmann
2009-07-30 16:53 ` stephane eranian
2009-07-30 17:20 ` Arnd Bergmann
2009-08-03 14:22 ` Peter Zijlstra
2009-06-22 11:50 ` I.2 - Grouping Ingo Molnar
2009-06-22 19:45 ` stephane eranian
2009-06-22 22:04 ` Corey Ashford
2009-06-23 17:51 ` stephane eranian
2009-06-22 21:38 ` Corey Ashford
2009-06-23 5:16 ` Paul Mackerras
2009-06-23 7:36 ` stephane eranian
2009-06-23 8:26 ` Paul Mackerras
2009-06-23 8:30 ` stephane eranian
2009-06-23 16:24 ` Corey Ashford
2009-06-22 11:51 ` I.3 - Multiplexing and system-wide Ingo Molnar
2009-06-22 11:51 ` I.4 - Controlling group multiplexing Ingo Molnar
2009-06-22 11:52 ` I.5 - Mmaped count Ingo Molnar
2009-06-22 12:25 ` stephane eranian
2009-06-22 12:35 ` Peter Zijlstra [this message]
2009-06-22 12:54 ` stephane eranian
2009-06-22 14:39 ` Peter Zijlstra
2009-06-23 0:41 ` Paul Mackerras
2009-06-23 0:39 ` Paul Mackerras
2009-06-23 6:13 ` Peter Zijlstra
2009-06-23 7:40 ` stephane eranian
2009-06-23 0:33 ` Paul Mackerras
2009-06-22 11:53 ` I.6 - Group scheduling Ingo Molnar
2009-06-22 11:54 ` I.7 - Group validity checking Ingo Molnar
2009-06-22 11:54 ` I.8 - Generalized cache events Ingo Molnar
2009-06-22 11:55 ` I.9 - Group reading Ingo Molnar
2009-06-22 11:55 ` I.10 - Event buffer minimal useful size Ingo Molnar
2009-06-22 11:56 ` I.11 - Missing definitions for generic events Ingo Molnar
2009-06-22 14:54 ` stephane eranian
2009-06-22 11:57 ` II.1 - Fixed counters on Intel Ingo Molnar
2009-06-22 14:27 ` stephane eranian
2009-06-22 11:57 ` II.2 - Event knowledge missing Ingo Molnar
2009-06-23 13:18 ` stephane eranian
2009-06-22 11:58 ` III.1 - Sampling period randomization Ingo Molnar
2009-06-22 11:58 ` IV.1 - Support for model-specific uncore PMU Ingo Molnar
2009-06-22 11:59 ` IV.2 - Features impacting all counters Ingo Molnar
2009-06-22 12:00 ` IV.3 - AMD IBS Ingo Molnar
2009-06-22 14:08 ` [perfmon2] " Rob Fowler
2009-06-22 17:58 ` Maynard Johnson
2009-06-23 6:19 ` Peter Zijlstra
2009-06-23 8:19 ` stephane eranian
2009-06-23 14:05 ` Ingo Molnar
2009-06-23 14:25 ` stephane eranian
2009-06-23 14:55 ` Ingo Molnar
2009-06-23 14:40 ` Rob Fowler
2009-06-22 19:17 ` stephane eranian
2009-06-22 12:00 ` IV.4 - Intel PEBS Ingo Molnar
2009-06-22 12:16 ` Andi Kleen
2009-06-22 12:01 ` IV.5 - Intel Last Branch Record (LBR) Ingo Molnar
2009-06-22 20:02 ` stephane eranian
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1245674154.19816.228.camel@twins \
--to=a.p.zijlstra@chello.nl \
--cc=akpm@linux-foundation.org \
--cc=andi@firstfloor.org \
--cc=cel@us.ibm.com \
--cc=cjashfor@us.ibm.com \
--cc=eranian@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@elte.hu \
--cc=mpjohn@us.ibm.com \
--cc=mucci@eecs.utk.edu \
--cc=paulus@samba.org \
--cc=perfmon2-devel@lists.sourceforge.net \
--cc=robert.richter@amd.com \
--cc=terpstra@eecs.utk.edu \
--cc=tglx@linutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox