Re: bts & perf_counters

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Ingo Molnar <mingo@elte.hu>
To: "Metzger, Markus T" <markus.t.metzger@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>,
	"H. Peter Anvin" <hpa@zytor.com>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>,
	Markus Metzger <markus.t.metzger@googlemail.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: bts & perf_counters
Date: Tue, 30 Jun 2009 21:32:29 +0200	[thread overview]
Message-ID: <20090630193229.GD20567@elte.hu> (raw)
In-Reply-To: <928CFBE8E7CB0040959E56B4EA41A77EBE519AE5@irsmsx504.ger.corp.intel.com>

* Metzger, Markus T <markus.t.metzger@intel.com> wrote:

> > How does 'interval' get mixed with BTS?
> 
> We could view BTS as event-based sampling with interval=1. The 
> sample we collect is the <from, to> address pair of an executed 
> branch and the sampling interval is 1, i.e. we store a sample for 
> every branch. Wouldn't this be how BTS integrates into 
> perf_counters?

Yeah, this is how i view it too.

> One of the big advantages that comes with using the perf_counter 
> framework is that you could mix branch tracing with other forms of 
> profiling and sampling.

Correct.

> >> Would it be possible for a user to profile the same task twice? 
> >> He could then use different buffers for different sampling 
> >> intervals.
> >
> > It's possibe to open multiple counters to the same task, yes.
> 
> That's good. And users could mmap every counter they open in order 
> to get multiple perf event streams?

Yes.

> OK. The existing implementation reconfigured DS area to have the 
> h/w already collect the trace into the correct buffer. The only 
> copying that is ever needed is to copy it into user-space while 
> translating the arch-specific format into an arch-independent 
> format.
> 
> This is obviously only possible for a single user. Copying the 
> data is definitely more flexible if we expect multiple users of 
> that data with different-sized buffers.

Yeah. [ That decoupling is nice as it also allows multiplexing - 
there's nothing that prevents from two independent monitor tasks 
from sampling the same task. (beyond the inevitable runtime overhead 
that is inherent in BTS anyway.) ]

> > If a task schedules out then it will have its DS area drained 
> > already to the mmap buffer - i.e. it's all properly 
> > synchronized.
> 
> When is that draining done? Somewhere in schedule()? Wouldn't that 
> be quite expensive for a few pages of BTS buffer?

Well, it is an open question how frequently we want to move 
information from the DS area into the mmap pages.

The most direct approach would be to 'flush' the DS from two places: 
the threshold IRQ handler plus from the context switch code if the 
BTS counter gets deactivated. In the latter case BTS activities have 
to stop anyway, so the DS can be flushed to the mmap pages.

Or is your mental model for getting the BTS records from the DS to 
the mmap pages significantly different?

I think we should shoot for the simplest approach initially - we can 
do other, more sophisticated streaming modes later as well - they 
will not differ in functionality, only in performance.

> Hmmm, I'll see what I can do. Please don't expect a minimally 
> working prototype to be bug-free from the beginning.

Sure, i dont.

> I see identifying the beginning of the stream as well as random 
> accesses into the stream as bigger open points.
> 
> Maybe we could add a mode where records are zero-extended to a 
> fixed size. This would leave the choice to the user: compact 
> format or random access.

I agree that streaming is a problem because the debugger does not 
want to poll() really - such an output mode and a 'ignore data_tail 
and overwrite old entries' ring-buffer modus operandi should be 
added.

The latter would be useful for tracepoints too for example, so such 
a 'flight recorder' or 'history buffer' mode is not limited to BTS.

So feel free to add something that meets your constant-size records 
needs - and we'll make sure it fits well into the rest of 
perfcounters.

So based on your suggestion we'd have two streaming models:

 - 'no information loss' output model where user-space poll()s and 
   tries hard not to lose events (this is what profilers and 
   reliable tracers do)

 - 'history ring-buffer' model - this is useful for debuggers and is 
   useful for certain modes of tracing as well. (crash-tracing for 
   example)

	Ingo

next prev parent reply	other threads:[~2009-06-30 19:32 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <tip-511b01bdf64ad8a38414096eab283c7784aebfc4@git.kernel.org>
2009-06-11  6:30 ` [tip:tracing/core] Revert "x86, bts: reenable ptrace branch trace support" Metzger, Markus T
2009-06-11  6:36   ` Peter Zijlstra
2009-06-11  7:17     ` Metzger, Markus T
2009-06-11  8:08       ` Peter Zijlstra
2009-06-11  8:30         ` Metzger, Markus T
2009-06-11 10:21     ` Ingo Molnar
2009-06-11 10:39       ` Metzger, Markus T
2009-06-11 21:41         ` Ingo Molnar
2009-06-12 11:04           ` Metzger, Markus T
2009-06-18 10:23             ` Metzger, Markus T
2009-06-24 13:10               ` Metzger, Markus T
     [not found]                 ` <20090624133645.GE6224@elte.hu>
     [not found]                   ` <928CFBE8E7CB0040959E56B4EA41A77EBE2DB9B9@irsmsx504.ger.corp.intel.com>
     [not found]                     ` <20090624153229.GA24346@elte.hu>
     [not found]                       ` <928CFBE8E7CB0040959E56B4EA41A77EBE2DC3D9@irsmsx504.ger.corp.intel.com>
     [not found]                         ` <20090626122948.GC10850@elte.hu>
     [not found]                           ` <928CFBE8E7CB0040959E56B4EA41A77EBE519869@irsmsx504.ger.corp.intel.com>
     [not found]                             ` <20090629202002.GF31577@elte.hu>
2009-06-30  7:32                               ` bts & perf_counters Metzger, Markus T
2009-06-30 19:32                                 ` Ingo Molnar [this message]
2009-07-06 15:34                                 ` Peter Zijlstra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090630193229.GD20567@elte.hu \
    --to=mingo@elte.hu \
    --cc=a.p.zijlstra@chello.nl \
    --cc=hpa@zytor.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=markus.t.metzger@googlemail.com \
    --cc=markus.t.metzger@intel.com \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox