Re: [patch 0/3] [Announcement] Performance Counters for Linux

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Ingo Molnar <mingo@elte.hu>
To: Paul Mackerras <paulus@samba.org>
Cc: Thomas Gleixner <tglx@linutronix.de>,
	LKML <linux-kernel@vger.kernel.org>,
	linux-arch@vger.kernel.org,
	Andrew Morton <akpm@linux-foundation.org>,
	Stephane Eranian <eranian@googlemail.com>,
	Eric Dumazet <dada1@cosmosbay.com>,
	Robert Richter <robert.richter@amd.com>,
	Arjan van de Veen <arjan@infradead.org>,
	Peter Anvin <hpa@zytor.com>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>,
	Steven Rostedt <rostedt@goodmis.org>,
	David Miller <davem@davemloft.net>
Subject: Re: [patch 0/3] [Announcement] Performance Counters for Linux
Date: Fri, 5 Dec 2008 13:07:34 +0100	[thread overview]
Message-ID: <20081205120734.GA26244@elte.hu> (raw)
In-Reply-To: <18744.61429.548462.667020@cargo.ozlabs.ibm.com>

* Paul Mackerras <paulus@samba.org> wrote:

> Ingo Molnar writes:
> 
> > * Paul Mackerras <paulus@samba.org> wrote:
> [...]
> > > Isn't it two separate read() calls to read the two counters?  If so, 
> > > the only way the two values are actually going to correspond to the 
> > > same point in time is if the task being monitored is stopped.  In which 
> > > case the monitoring task needs to use ptrace or something similar in 
> > > order to make sure that the monitored task is actually stopped.
> > 
> > It doesnt matter in practice.
> 
> Can I ask - and this is a real question, I'm not being sarcastic - is 
> that statement made with substantial serious experience in performance 
> analysis behind it, or is it just an intuition?
> 
> I will happily admit that I am not a great expert on performance 
> analysis with years of experience.  But I have taken a bit of a look at 
> what people with that sort of experience do, and I don't think they 
> would agree with your "doesn't matter" statement.

A stream of read()s possibly slightly being off is an order of magnitude 
smaller of an effect to precision. Look at the numbers: on the testbox i 
have a read() syscall takes 0.2 microseconds, while a context-switch 
takes 2 microseconds on the local CPU and about 5-10 microseconds 
cross-CPU (or more, if the cache pattern is unlucky/unaffine). That's 
10-25-50 times more expensive. You can do 9-24-49 reads and still be 
cheaper. Compound syscalls are almost never worth the complexity.

So as a scheduler person i cannot really take the perfmon "ptrace 
approach" seriously, and i explained that in great detail already. It 
clearly came from HPC workload quarters where tasks are persistent 
entities running alone on a single CPU that just use up CPU time there 
and dont interact with each other too much. That's a good and important 
profiling target for sure - but by no means the only workload target to 
design a core kernel facility for. It's an absolutely horrible approach 
for a number of more common workloads for sure.

> > Such kind of 'group system call facility' has been suggested several 
> > times in the past - but ... never got anywhere because system calls 
> > are cheap enough, it really does not count in practice.
> > 
> > It could be implemented, and note that because our code uses a proper 
> > Linux file descriptor abstraction, such a sys_read_fds() facility 
> > would help _other_ applications as well, not just performance 
> > counters.
> > 
> > But it brings complications: demultiplexing of error conditions on 
> > individual counters is a real pain with any compound abstraction. We 
> > very consciously went with the 'one fd, one object, one counter' 
> > design.
> 
> And I think that is the fundamental flaw.  On the machines I am 
> familiar with, the performance counters as not separate things that can 
> individually and independently be assigned to count one thing or 
> another.

Today we've implemented virtual counter scheduling in our to-be-v2 code:

   3 files changed, 36 insertions(+), 1 deletion(-)

hello.c gives:

 counter[0 cycles              ]:  10121258163 , delta:    844256826 events
 counter[1 instructions        ]:   4160893621 , delta:    347054666 events
 counter[2 cache-refs          ]:         2297 , delta:          179 events
 counter[3 cache-misses        ]:            3 , delta:            0 events
 counter[4 branch-instructions ]:    799422166 , delta:     66551572 events
 counter[5 branch-misses       ]:         7286 , delta:          775 events

All we need to get that array of information from 6 sw counters is a 
_single_ hardware counter. I'm not sure where you read "you must map sw 
counters to hw counters directly" or "hw counters must be independent of 
each other" into our design - it's not part of it, emphatically.

And i dont see your (fully correct!) statement above about counter 
constraints to be in any sort of conflict with what we are doing.

Intel hardware is just as constrained as powerpc hardware: there are 
counter inter-dependencies and many CPUs have just two performance 
counters. We very much took this into account while designing this code.

[ Obviously, you _can_ do higher quality profiling if you have more 
  hardware resources that help it. Nothing will change that fact. ]

> Rather, what the hardware provides is ONE performance monitor unit, 
> which the OS can context-switch between tasks.  The performance monitor 
> unit has several counters that can be assigned (within limits) to count 
> various aspects of the performance of the code being executed.  That is 
> why, for instance, if you ask for the counters to be frozen when one of 
> them overflows, they all get frozen at that point.

i dont see this as an issue at all - it's a feature of powerpc over x86 
that the core perfcounter code can support just fine. The overflow IRQ 
handler is arch specific. The overflow IRQ handler, if it triggers, 
updates the sw counters, creates any event records if needed, wakes up 
the monitor task if needed, and continues the task and performance 
measurement without having scheduled out. Demultiplexing of hw counters 
is arch-specific.

	Ingo

next prev parent reply	other threads:[~2008-12-05 12:08 UTC|newest]

Thread overview: 68+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-12-04 23:44 [patch 0/3] [Announcement] Performance Counters for Linux Thomas Gleixner
2008-12-04 23:44 ` [patch 1/3] performance counters: core code Thomas Gleixner
2008-12-05 10:55   ` Paul Mackerras
2008-12-05 11:20     ` Ingo Molnar
2008-12-04 23:44 ` [patch 2/3] performance counters: documentation Thomas Gleixner
2008-12-05  0:33   ` Paul Mackerras
2008-12-05  0:37     ` David Miller
2008-12-05  2:50       ` Arjan van de Ven
2008-12-05  3:26         ` David Miller
2008-12-05  2:33     ` Andi Kleen
2008-12-04 23:45 ` [patch 3/3] performance counters: x86 support Thomas Gleixner
2008-12-05  0:22 ` [patch 0/3] [Announcement] Performance Counters for Linux Paul Mackerras
2008-12-05  6:31   ` Ingo Molnar
2008-12-05  7:02     ` Arjan van de Ven
2008-12-05  7:52       ` David Miller
2008-12-05  7:03     ` Ingo Molnar
2008-12-05  7:16       ` Peter Zijlstra
2008-12-05  7:57         ` Paul Mackerras
2008-12-05  8:03           ` Peter Zijlstra
2008-12-05  8:07             ` David Miller
2008-12-05  8:11               ` Ingo Molnar
2008-12-05  8:17                 ` David Miller
2008-12-05  8:24                   ` Ingo Molnar
2008-12-05  8:27                     ` David Miller
2008-12-05  8:42                       ` Ingo Molnar
2008-12-05  8:49                         ` David Miller
2008-12-05 12:13                           ` Ingo Molnar
2008-12-05 12:39                         ` Andi Kleen
2008-12-05 20:08                           ` David Miller
2008-12-10  3:48                             ` Paul Mundt
2008-12-10  4:42                               ` Paul Mackerras
2008-12-10  8:43                               ` Mikael Pettersson
2008-12-10 10:28                               ` Andi Kleen
2008-12-10 10:23                                 ` Paul Mundt
2008-12-10 11:03                                   ` Andi Kleen
2008-12-05 15:00               ` Arjan van de Ven
2008-12-05  9:16             ` Paul Mackerras
2008-12-05  7:57       ` David Miller
2008-12-05  8:18         ` Ingo Molnar
2008-12-05  8:20           ` David Miller
2008-12-05  7:54     ` Paul Mackerras
2008-12-05  8:08       ` Ingo Molnar
2008-12-05  8:15         ` David Miller
2008-12-05 13:25           ` Ingo Molnar
2008-12-05  9:10         ` Paul Mackerras
2008-12-05 12:07           ` Ingo Molnar [this message]
2008-12-06  0:05             ` Paul Mackerras
2008-12-06  1:23               ` Mikael Pettersson
2008-12-06 12:34               ` Peter Zijlstra
2008-12-07  5:15                 ` Paul Mackerras
2008-12-08  7:18                   ` stephane eranian
2008-12-08 11:11                     ` Ingo Molnar
2008-12-08 11:58                       ` David Miller
2008-12-09  0:21                       ` stephane eranian
2008-12-05  0:22 ` H. Peter Anvin
2008-12-05  0:43   ` Paul Mackerras
2008-12-05  1:12 ` David Miller
2008-12-05  6:10   ` Ingo Molnar
2008-12-05  7:50     ` David Miller
2008-12-05  9:34     ` Paul Mackerras
2008-12-05 10:41       ` Ingo Molnar
2008-12-05 10:05     ` Ingo Molnar
2008-12-05  3:30 ` Andrew Morton
2008-12-06  2:36 ` stephane eranian
2008-12-08  2:12   ` [perfmon2] [patch 0/3] [Announcement] Performance Counters forLinux Dan Terpstra
2008-12-10 16:27   ` [perfmon2] [patch 0/3] [Announcement] Performance Counters for Linux Rob Fowler
2008-12-10 17:11     ` Andi Kleen
  -- strict thread matches above, loose matches on Subject: below --
2008-12-05 21:24 Corey Ashford

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20081205120734.GA26244@elte.hu \
    --to=mingo@elte.hu \
    --cc=a.p.zijlstra@chello.nl \
    --cc=akpm@linux-foundation.org \
    --cc=arjan@infradead.org \
    --cc=dada1@cosmosbay.com \
    --cc=davem@davemloft.net \
    --cc=eranian@googlemail.com \
    --cc=hpa@zytor.com \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=paulus@samba.org \
    --cc=robert.richter@amd.com \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox