* perf_arch_fetch_caller_regs...
@ 2010-03-19 4:02 David Miller
2010-03-19 4:24 ` perf_arch_fetch_caller_regs Frederic Weisbecker
2010-03-19 4:32 ` perf_arch_fetch_caller_regs Paul Mackerras
0 siblings, 2 replies; 5+ messages in thread
From: David Miller @ 2010-03-19 4:02 UTC (permalink / raw)
To: fweisbec; +Cc: mingo, paulus, linux-arch
Can we please remove the CALLER_ADDR0 et al. evaluations at the top
level in perf_fetch_caller_regs()?
I take great pains to avoid having to flush the register windows on
sparc64 even when fetching callchains et al and any
__builtin_return_address() with an argument greater than zero is going
to force a register window flush to get emitted by gcc undoing all of
my hard work :-)
You guys can put it into the x86 perf_fetch_caller_regs() or similar.
If you need it to be evaluated at the call site, make the inline
overridable by the platform headers.
I noticed that the powerpc assembler Paul posted the past few days
ignores this "ip" arg passed down and computes it by hand as it
walks up the stack chain in assembler. PowerPC therefore might be
getting similar inefficiences due to this CALLER_ADDR? stuff.
Thanks.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: perf_arch_fetch_caller_regs...
2010-03-19 4:02 perf_arch_fetch_caller_regs David Miller
@ 2010-03-19 4:24 ` Frederic Weisbecker
2010-03-19 4:47 ` perf_arch_fetch_caller_regs David Miller
2010-03-19 4:32 ` perf_arch_fetch_caller_regs Paul Mackerras
1 sibling, 1 reply; 5+ messages in thread
From: Frederic Weisbecker @ 2010-03-19 4:24 UTC (permalink / raw)
To: David Miller; +Cc: mingo, paulus, linux-arch
On Thu, Mar 18, 2010 at 09:02:41PM -0700, David Miller wrote:
>
> Can we please remove the CALLER_ADDR0 et al. evaluations at the top
> level in perf_fetch_caller_regs()?
>
> I take great pains to avoid having to flush the register windows on
> sparc64 even when fetching callchains et al and any
> __builtin_return_address() with an argument greater than zero is going
> to force a register window flush to get emitted by gcc undoing all of
> my hard work :-)
Ah. But does that really cause bad things? I mean you don't need
to save all the ix/ox/lx registers, only the instruction pointer,
frame pointer, flags, and something that can help user_mode() to
return 0.
I don't know deeply this area, but the frame pointer is
part of the window in the ix things, right? And the CALLER
thing screws up the frame pointer and makes it hard to rewind
precisely?
>
> You guys can put it into the x86 perf_fetch_caller_regs() or similar.
>
> If you need it to be evaluated at the call site, make the inline
> overridable by the platform headers.
>
> I noticed that the powerpc assembler Paul posted the past few days
> ignores this "ip" arg passed down and computes it by hand as it
> walks up the stack chain in assembler. PowerPC therefore might be
> getting similar inefficiences due to this CALLER_ADDR? stuff.
Yeah, ok if this is that a burden/useless for archs then I'm going to
remove it.
BTW, I'm reworking this to make it a macro for various reasons
(weak and export_symbol being unfriendly together in the same
file and regular software events are going to need it. So I
can't anymore put the export_symbol in another file than
perf_event.c. Since it's impossible I need perf_fetch_caller_regs()
to be a macro now...)
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: perf_arch_fetch_caller_regs...
2010-03-19 4:24 ` perf_arch_fetch_caller_regs Frederic Weisbecker
@ 2010-03-19 4:47 ` David Miller
0 siblings, 0 replies; 5+ messages in thread
From: David Miller @ 2010-03-19 4:47 UTC (permalink / raw)
To: fweisbec; +Cc: mingo, paulus, linux-arch
From: Frederic Weisbecker <fweisbec@gmail.com>
Date: Fri, 19 Mar 2010 05:24:16 +0100
> On Thu, Mar 18, 2010 at 09:02:41PM -0700, David Miller wrote:
>>
>> Can we please remove the CALLER_ADDR0 et al. evaluations at the top
>> level in perf_fetch_caller_regs()?
>>
>> I take great pains to avoid having to flush the register windows on
>> sparc64 even when fetching callchains et al and any
>> __builtin_return_address() with an argument greater than zero is going
>> to force a register window flush to get emitted by gcc undoing all of
>> my hard work :-)
>
>
> Ah. But does that really cause bad things? I mean you don't need
> to save all the ix/ox/lx registers, only the instruction pointer,
> frame pointer, flags, and something that can help user_mode() to
> return 0.
>
> I don't know deeply this area, but the frame pointer is
> part of the window in the ix things, right? And the CALLER
> thing screws up the frame pointer and makes it hard to rewind
> precisely?
It causes bad things, as in performance is rediculiously bad if
GCC emits that register window flush. That's the whole point of
my mail, it undoes all of the hard work I do to avoid the register
window flushes.
I fix the performance problems by doing it by hand, walking up the
unflushed register windows and storing only the frame pointer and
return PC into the stack frame.
> Yeah, ok if this is that a burden/useless for archs then I'm going to
> remove it.
Thanks.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: perf_arch_fetch_caller_regs...
2010-03-19 4:02 perf_arch_fetch_caller_regs David Miller
2010-03-19 4:24 ` perf_arch_fetch_caller_regs Frederic Weisbecker
@ 2010-03-19 4:32 ` Paul Mackerras
2010-03-19 4:51 ` perf_arch_fetch_caller_regs David Miller
1 sibling, 1 reply; 5+ messages in thread
From: Paul Mackerras @ 2010-03-19 4:32 UTC (permalink / raw)
To: David Miller; +Cc: fweisbec, mingo, linux-arch
On Thu, Mar 18, 2010 at 09:02:41PM -0700, David Miller wrote:
>
> Can we please remove the CALLER_ADDR0 et al. evaluations at the top
> level in perf_fetch_caller_regs()?
>
> I take great pains to avoid having to flush the register windows on
> sparc64 even when fetching callchains et al and any
> __builtin_return_address() with an argument greater than zero is going
> to force a register window flush to get emitted by gcc undoing all of
> my hard work :-)
>
> You guys can put it into the x86 perf_fetch_caller_regs() or similar.
>
> If you need it to be evaluated at the call site, make the inline
> overridable by the platform headers.
>
> I noticed that the powerpc assembler Paul posted the past few days
> ignores this "ip" arg passed down and computes it by hand as it
> walks up the stack chain in assembler. PowerPC therefore might be
> getting similar inefficiences due to this CALLER_ADDR? stuff.
Well, it would except that CALLER_ADDR1, 2, etc. turn into (0) on
powerpc because we use the generic definition and we don't define
CONFIG_FRAME_POINTER (it's meaningless on powerpc because the ABI
defines that each stack frame always has a pointer to the previous
frame).
I should fix CALLER_ADDRx on powerpc one day, then we will have the
extra inefficiency.
Paul.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: perf_arch_fetch_caller_regs...
2010-03-19 4:32 ` perf_arch_fetch_caller_regs Paul Mackerras
@ 2010-03-19 4:51 ` David Miller
0 siblings, 0 replies; 5+ messages in thread
From: David Miller @ 2010-03-19 4:51 UTC (permalink / raw)
To: paulus; +Cc: fweisbec, mingo, linux-arch
From: Paul Mackerras <paulus@samba.org>
Date: Fri, 19 Mar 2010 15:32:02 +1100
> On Thu, Mar 18, 2010 at 09:02:41PM -0700, David Miller wrote:
>> I noticed that the powerpc assembler Paul posted the past few days
>> ignores this "ip" arg passed down and computes it by hand as it
>> walks up the stack chain in assembler. PowerPC therefore might be
>> getting similar inefficiences due to this CALLER_ADDR? stuff.
>
> Well, it would except that CALLER_ADDR1, 2, etc. turn into (0) on
> powerpc because we use the generic definition and we don't define
> CONFIG_FRAME_POINTER (it's meaningless on powerpc because the ABI
> defines that each stack frame always has a pointer to the previous
> frame).
>
> I should fix CALLER_ADDRx on powerpc one day, then we will have the
> extra inefficiency.
Gosh, that CONFIG_FRAME_POINTER dependency is 'nifty', how does the
CALLER_ADDR{1,2} usage made by the scheduler tracepoints work on
powerpc then?
I suppose I could define HAVE_ARCH_CALLER_ADDR and optimize them
on sparc64, similar to what SH is doing.
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2010-03-19 4:51 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-03-19 4:02 perf_arch_fetch_caller_regs David Miller
2010-03-19 4:24 ` perf_arch_fetch_caller_regs Frederic Weisbecker
2010-03-19 4:47 ` perf_arch_fetch_caller_regs David Miller
2010-03-19 4:32 ` perf_arch_fetch_caller_regs Paul Mackerras
2010-03-19 4:51 ` perf_arch_fetch_caller_regs David Miller
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).