public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Corey Ashford <cjashfor@linux.vnet.ibm.com>
To: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <peterz@infradead.org>,
	Paul Mackerras <paulus@samba.org>,
	Arnaldo Carvalho de Melo <acme@ghostprotocols.net>,
	Thomas Gleixner <tglx@linutronix.de>,
	linux-kernel <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] perf_counter: extensible perf_counter_attr
Date: Tue, 09 Jun 2009 16:16:52 -0700	[thread overview]
Message-ID: <4A2EED64.9070605@linux.vnet.ibm.com> (raw)
In-Reply-To: <20090609220020.GA26526@elte.hu>



Ingo Molnar wrote:
> * Corey Ashford <cjashfor@linux.vnet.ibm.com> wrote:
> 
>> Thanks for your reply, Ingo.
>>
>> Ingo Molnar wrote:
>>> I think PEBS is best supported by a generic abstraction. Something  
>>> like this: it's basically a special sampling format, that generates a 
>>> record of:
>>>
>>> 	struct pt_regs regs;
>>> 	__u64 insn_latency; /* optional */
>>> 	__u64 data_address; /* optional */
>>>
>>> this is pretty generic.
>>>
>>> The raw CPU records have a CPU specific format, and they have to be  
>>> demultiplexed anyway (on Nehalem, which can have up to four separate  
>>> PEBS counters - but each output into the same DS area), so the  
>>> lowlevel arch code converts the CPU record into the above generic  
>>> sample record when it copies it into the mmap pages. It's a quick copy 
>>> so no big deal performance-wise.
>>>
>>> ( Details:
>>>
>>>    - there might be some additional complications from sampling      
>>> 32-bit contexts, but that too is a mostly low level detail that      
>>> gets hidden.
>>>
>>>    - we might use a tiny bit more compact registers structure than
>>>      struct pt_regs. OTOH it's a well-known structure so it makes      
>>> sense to standardize on it, even if the CPU doesnt sample all      
>>> registers.
>>> )
>>>
>>>   
>> I see, so that's how you'd return the data.  How would a user 
>> specify that they want to use PEBS?
> 
> They wouldnt need to. PEBS would result in more precise samples when 
> certain counters are used - and transparently so.
> 
> The non-PEBS NMI samples are pretty accurate to begin with, so it's 
> not a _major_ difference in quality.
> 
> PEBS would also auto-activate when a user wants to sample say the 
> instruction latency as well - and on non-PEBS hardware the platform 
> code would refuse to open the counter.
> 
> PEBS could also be used to reduce the number of NMIs needed - so 
> it's a transparent speedup as well.
> 
>> Another observation is that you'd need some sort of bit vector, or 
>> at the least a document, that describes which registers are valid 
>> in the pt_regs struct.
> 
> Initially i think we should use PEBS to transparently enhance 
> regular IP samples.
> 
> What would we use the other registers for? Many registers will have 
> stack context dependencies so the PEBS data cannot really be 
> analyzed in any meaningful way after the fact.

Well, you were the one that offered up the idea of using pt_regs as an 
abstraction.  For the case of PEBS, I think 98% of it would be emptry since the 
sampling mechanism can get a PC and maybe a data address (?).

> 
> (except perhaps for the narrow purpose of debugging)
> 
>>> Can you see desirable PEBS-alike PMU features that cannot be expressed 
>>> via such means?
>> Power PMU's provide some fairly complex features, such as a 
>> thresholding mechanism which is used for marking instructions, and 
>> also there's an Instruction Matching CAM which can be used to mark 
>> only on certain instruction types.  Since these features are 
>> present only on Power, I'm not sure it makes sense to go to the 
>> trouble of abstracting them for use on other arch/chip designs.
> 
> Could you please describe the low level semantics more accurately - 
> at least of the ones you find to be the most useful in practice?

Ok, some disclosure here: we have not yet supported either of these features in 
the Power PMU's in any open source code (and perhaps the proprietary code too, 
but I don't know about that).  Since these features are described in an IBM 
proprietary document, I can't describe how they work here, except to say that 
they are present in the chip.

I realize this is not very useful to come here and complain there's no support 
for something I can't describe!  (maybe later or maybe never... not sure)

But in general I do think there should be some sort of provision in the 
interface for some chip-specific features for which fully abstracting their 
function doesn't make a lot of sense.  At some point, if we find that there's a 
good reason to support these features in open source code (and thereby divulging 
how they work), it would be nice if there was an avenue for adding such support 
to PCL.

- Corey

Corey Ashford
Software Engineer
IBM Linux Technology Center, Linux Toolchain
cjashfor@us.ibm.com


  reply	other threads:[~2009-06-09 23:16 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-06-08 17:25 [PATCH] perf_counter: extensible perf_counter_attr Peter Zijlstra
2009-06-08 19:02 ` Corey Ashford
2009-06-08 19:51   ` Peter Zijlstra
2009-06-08 21:18     ` Corey Ashford
2009-06-08 21:23       ` Peter Zijlstra
2009-06-08 21:29         ` Corey Ashford
2009-06-08 21:50           ` Ingo Molnar
2009-06-09  0:50             ` Corey Ashford
2009-06-09  6:51               ` Ingo Molnar
2009-06-09  8:13                 ` Corey Ashford
2009-06-09 11:53                   ` Ingo Molnar
2009-06-09 16:44                     ` Corey Ashford
2009-06-09 22:00                       ` Ingo Molnar
2009-06-09 23:16                         ` Corey Ashford [this message]
2009-06-10  0:14                           ` Paul Mackerras
2009-06-10 22:06                             ` Corey Ashford
2009-06-09  4:17 ` Paul Mackerras
2009-06-09  6:53   ` Ingo Molnar
2009-06-09  9:58     ` Paul Mackerras

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4A2EED64.9070605@linux.vnet.ibm.com \
    --to=cjashfor@linux.vnet.ibm.com \
    --cc=acme@ghostprotocols.net \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=paulus@samba.org \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox