From: Frederic Weisbecker <fweisbec@gmail.com>
To: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>,
linux-kernel@vger.kernel.org, linuxppc-dev@ozlabs.org,
anton@samba.org, Ingo Molnar <mingo@elte.hu>
Subject: Re: [PATCH] powerpc/perf_events: Implement perf_arch_fetch_caller_regs for powerpc
Date: Tue, 16 Mar 2010 21:56:28 +0100 [thread overview]
Message-ID: <20100316205609.GA6822@nowhere> (raw)
In-Reply-To: <20100316032213.GA3656@drongo>
On Tue, Mar 16, 2010 at 02:22:13PM +1100, Paul Mackerras wrote:
> On Mon, Mar 15, 2010 at 10:04:54PM +0100, Frederic Weisbecker wrote:
> > On Mon, Mar 15, 2010 at 04:46:15PM +1100, Paul Mackerras wrote:
>
> > > 14.99% perf [kernel.kallsyms] [k] ._raw_spin_lock
> > > |
> > > --- ._raw_spin_lock
> > > |
> > > |--25.00%-- .alloc_fd
> > > | (nil)
> > > | |
> > > | |--50.00%-- .anon_inode_getfd
> > > | | .sys_perf_event_open
> > > | | syscall_exit
> > > | | syscall
> > > | | create_counter
> > > | | __cmd_record
> > > | | run_builtin
> > > | | main
> > > | | 0xfd2e704
> > > | | 0xfd2e8c0
> > > | | (nil)
> > >
> > > ... etc.
> > >
> > > Signed-off-by: Paul Mackerras <paulus@samba.org>
> >
> >
> > Cool!
>
> By the way, I notice that gcc tends to inline the tracing functions,
> which means that by going up 2 stack frames we miss some of the
> functions. For example, for the lock:lock_acquire event, we have
> _raw_spin_lock() -> lock_acquire() -> trace_lock_acquire() ->
> perf_trace_lock_acquire() -> perf_trace_templ_lock_acquire() ->
> perf_fetch_caller_regs() -> perf_arch_fetch_caller_regs().
>
> But in the ppc64 kernel binary I just built, gcc inlined
> trace_lock_acquire in lock_acquire, and perf_trace_templ_lock_acquire
> in perf_trace_lock_acquire. Given that perf_fetch_caller_regs is
> explicitly inlined, going up two levels from perf_fetch_caller_regs
> gets us to _raw_spin_lock, whereas I think you intended it to get us
> to trace_lock_acquire. I'm not sure what to do about that - any
> thoughts?
Yeah I've indeed seen this, and the problem is especially
the fact perf_trace_templ_lock_acquire may or may not be
inlined.
It is used for trace events that use the TRACE_EVENT_CLASS
thing. We define a pattern of event structure that is shared
among several events.
For example event A and event B share perf_trace_templ_foo.
Both will have a different perf_trace_blah but those
perf_trace_blah will both call the same perf_trace_templ_foo(),
in this case, it won't be inlined.
Events that don't share a pattern will have their
perf_trace_templ inlined, because there will be an exclusive 1:1
relationship between both.
The rewind of 2 is well suited for events sharing a pattern, ip
will match the right event source, and not one of its callers.
Unfortunately, the others are more unlucky.
I didn't mind much about this yet because it had no bad effect
on lock events. Quite the opposite actually. It's not very interesting
to have lock_acquire as the event source unless you have a callchain.
If you have no callchain, you'll see a lot of such in perf report:
sym1 lock_aquire
sym2 lock_acquire
sym3 lock_acquire
What you want here is the function that called lock_acquire.
But if you have a callchain it's fine, because you have the nature
of the event (lock_aquire) and the origin as well.
That said, lock events are an exception where the mistake
has a lucky result. Other inlined events are harmed as we lose
their most important caller. So I'm going to fix that.
I can just fetch the regs from perf_trace_foo() and pass them
to perf_trace_templ_foo() and here we are.
Thanks.
WARNING: multiple messages have this Message-ID (diff)
From: Frederic Weisbecker <fweisbec@gmail.com>
To: Paul Mackerras <paulus@samba.org>
Cc: Ingo Molnar <mingo@elte.hu>,
Peter Zijlstra <peterz@infradead.org>,
benh@kernel.crashing.org, linux-kernel@vger.kernel.org,
anton@samba.org, linuxppc-dev@ozlabs.org
Subject: Re: [PATCH] powerpc/perf_events: Implement perf_arch_fetch_caller_regs for powerpc
Date: Tue, 16 Mar 2010 21:56:28 +0100 [thread overview]
Message-ID: <20100316205609.GA6822@nowhere> (raw)
In-Reply-To: <20100316032213.GA3656@drongo>
On Tue, Mar 16, 2010 at 02:22:13PM +1100, Paul Mackerras wrote:
> On Mon, Mar 15, 2010 at 10:04:54PM +0100, Frederic Weisbecker wrote:
> > On Mon, Mar 15, 2010 at 04:46:15PM +1100, Paul Mackerras wrote:
>
> > > 14.99% perf [kernel.kallsyms] [k] ._raw_spin_lock
> > > |
> > > --- ._raw_spin_lock
> > > |
> > > |--25.00%-- .alloc_fd
> > > | (nil)
> > > | |
> > > | |--50.00%-- .anon_inode_getfd
> > > | | .sys_perf_event_open
> > > | | syscall_exit
> > > | | syscall
> > > | | create_counter
> > > | | __cmd_record
> > > | | run_builtin
> > > | | main
> > > | | 0xfd2e704
> > > | | 0xfd2e8c0
> > > | | (nil)
> > >
> > > ... etc.
> > >
> > > Signed-off-by: Paul Mackerras <paulus@samba.org>
> >
> >
> > Cool!
>
> By the way, I notice that gcc tends to inline the tracing functions,
> which means that by going up 2 stack frames we miss some of the
> functions. For example, for the lock:lock_acquire event, we have
> _raw_spin_lock() -> lock_acquire() -> trace_lock_acquire() ->
> perf_trace_lock_acquire() -> perf_trace_templ_lock_acquire() ->
> perf_fetch_caller_regs() -> perf_arch_fetch_caller_regs().
>
> But in the ppc64 kernel binary I just built, gcc inlined
> trace_lock_acquire in lock_acquire, and perf_trace_templ_lock_acquire
> in perf_trace_lock_acquire. Given that perf_fetch_caller_regs is
> explicitly inlined, going up two levels from perf_fetch_caller_regs
> gets us to _raw_spin_lock, whereas I think you intended it to get us
> to trace_lock_acquire. I'm not sure what to do about that - any
> thoughts?
Yeah I've indeed seen this, and the problem is especially
the fact perf_trace_templ_lock_acquire may or may not be
inlined.
It is used for trace events that use the TRACE_EVENT_CLASS
thing. We define a pattern of event structure that is shared
among several events.
For example event A and event B share perf_trace_templ_foo.
Both will have a different perf_trace_blah but those
perf_trace_blah will both call the same perf_trace_templ_foo(),
in this case, it won't be inlined.
Events that don't share a pattern will have their
perf_trace_templ inlined, because there will be an exclusive 1:1
relationship between both.
The rewind of 2 is well suited for events sharing a pattern, ip
will match the right event source, and not one of its callers.
Unfortunately, the others are more unlucky.
I didn't mind much about this yet because it had no bad effect
on lock events. Quite the opposite actually. It's not very interesting
to have lock_acquire as the event source unless you have a callchain.
If you have no callchain, you'll see a lot of such in perf report:
sym1 lock_aquire
sym2 lock_acquire
sym3 lock_acquire
What you want here is the function that called lock_acquire.
But if you have a callchain it's fine, because you have the nature
of the event (lock_aquire) and the origin as well.
That said, lock events are an exception where the mistake
has a lucky result. Other inlined events are harmed as we lose
their most important caller. So I'm going to fix that.
I can just fetch the regs from perf_trace_foo() and pass them
to perf_trace_templ_foo() and here we are.
Thanks.
next prev parent reply other threads:[~2010-03-16 20:56 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-03-15 5:46 [PATCH] powerpc/perf_events: Implement perf_arch_fetch_caller_regs for powerpc Paul Mackerras
2010-03-15 5:46 ` Paul Mackerras
2010-03-15 17:36 ` Michael Neuling
2010-03-15 17:36 ` Michael Neuling
2010-03-15 21:04 ` Frederic Weisbecker
2010-03-15 21:04 ` Frederic Weisbecker
2010-03-16 3:22 ` Paul Mackerras
2010-03-16 3:22 ` Paul Mackerras
2010-03-16 20:56 ` Frederic Weisbecker [this message]
2010-03-16 20:56 ` Frederic Weisbecker
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20100316205609.GA6822@nowhere \
--to=fweisbec@gmail.com \
--cc=anton@samba.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linuxppc-dev@ozlabs.org \
--cc=mingo@elte.hu \
--cc=paulus@samba.org \
--cc=peterz@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.