[RFC] BTS based perf user callchains

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Frederic Weisbecker <fweisbec@gmail.com>
To: Ingo Molnar <mingo@elte.hu>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>,
	Arnaldo Carvalho de Melo <acme@redhat.com>,
	Paul Mackerras <paulus@samba.org>,
	Stephane Eranian <eranian@google.com>,
	Markus Metzger <markus.t.metzger@intel.com>,
	Robert Richter <robert.richter@amd.com>
Cc: LKML <linux-kernel@vger.kernel.org>
Subject: [RFC] BTS based perf user callchains
Date: Mon, 2 Aug 2010 20:35:08 +0200	[thread overview]
Message-ID: <20100802183506.GA8962@nowhere> (raw)

Hi,

As you may know there is an issue with user stacktraces: it requires
userspace apps to be built with frame pointers.

So there is something we can try: dump a piece of the top user stack page
each time we have an event hit and let the tools deal with that later using
the dwarf informations.

But before trying that, which might require heavy copies, I would like to
try something based on BTS. The idea is to look at the branch buffer and
only pick addresses of branches that originated from "call" instructions.

So we want BTS activated, only in user ring, without the need of interrupts
once we reach the limit of the buffer, we can just run in a kind of live
mode and read on need. This could be a secondary perf event that has no mmap
buffer. Something only used by the kernel internally by others true perf events
in a given context. Primary perf events can then read on this BTS buffer when
they want.

Now there are two ways:

- record the whole branch buffer each time we overflow on another perf event
and let post processing userspace deal with "call" instruction filtering to
build the stacktrace on top of the branch trace.

- do the "call" filtering on record time. That requires to inspect each
recorded branches and look at the instruction content from the fast path.

I don't know which solution could be the faster one.

I'm not even sure that will work. Also, while looking at the BTS implementation
in perf, I see we have one BTS buffer per cpu. But that doesn't look right as
the code flow is not linear per cpu but per task. Hence I suspect we need
one BTS buffer per task. But may be someone tried that and encountered a
problem?

Tell me your feelings.

Thanks.

next             reply	other threads:[~2010-08-02 18:35 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-08-02 18:35 Frederic Weisbecker [this message]
2010-08-02 18:38 ` [RFC] BTS based perf user callchains Peter Zijlstra
2010-08-02 18:41   ` Frederic Weisbecker
2010-08-02 19:47     ` Peter Zijlstra
2010-08-03  6:53 ` Metzger, Markus T

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100802183506.GA8962@nowhere \
    --to=fweisbec@gmail.com \
    --cc=a.p.zijlstra@chello.nl \
    --cc=acme@redhat.com \
    --cc=eranian@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=markus.t.metzger@intel.com \
    --cc=mingo@elte.hu \
    --cc=paulus@samba.org \
    --cc=robert.richter@amd.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox