* 'annotate' confusingly reports 'add' instruction as taking most CPU time
@ 2013-10-01 4:06 Johan Tibell
2013-10-01 12:53 ` Vince Weaver
0 siblings, 1 reply; 3+ messages in thread
From: Johan Tibell @ 2013-10-01 4:06 UTC (permalink / raw)
To: linux-perf-users
Hi,
I'm trying to use 'perf record' to find the hotspot inside a function.
The results I'm seeing are confusing. 'annotate' claims that a simple
'add' instruction is the main CPU time consumer"
0.00 : 408246: mov 0x8(%rbp),%rsi
0.31 :
40824a: mov 0x10(%rbp),%rcx
0.00 : 40824e:
lea 0x4(%rcx),%rdi
0.81 : 408252: mov
0x6(%rbx),%rbx
0.00 : 408256: mov
0x18(%rbx,%rax,8),%r8
---> 27.58 : 40825b: add $0x20,%rbp
0.00 : 40825f: jmpq 408338 <Main_zdwpolyzugo_info>
0.00 : 408264:
mov $0x7b9609,%ebx
0.00 : 408269: add
$0x20,%rbp
0.00 : 40826d: jmpq *0x0(%rbp)
5.41 : 408270: mov 0xd(%rbx),%rax
How shall I interpret this? Is it really the following jump that is to blame?
-- Johan
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: 'annotate' confusingly reports 'add' instruction as taking most CPU time
2013-10-01 4:06 'annotate' confusingly reports 'add' instruction as taking most CPU time Johan Tibell
@ 2013-10-01 12:53 ` Vince Weaver
2013-10-01 15:14 ` Johan Tibell
0 siblings, 1 reply; 3+ messages in thread
From: Vince Weaver @ 2013-10-01 12:53 UTC (permalink / raw)
To: Johan Tibell; +Cc: linux-perf-users
On Mon, 30 Sep 2013, Johan Tibell wrote:
> I'm trying to use 'perf record' to find the hotspot inside a function.
> The results I'm seeing are confusing. 'annotate' claims that a simple
> 'add' instruction is the main CPU time consumer"
>
> 0.00 : 408246: mov 0x8(%rbp),%rsi
> 0.31 :
> 40824a: mov 0x10(%rbp),%rcx
> 0.00 : 40824e:
> lea 0x4(%rcx),%rdi
> 0.81 : 408252: mov
> 0x6(%rbx),%rbx
> 0.00 : 408256: mov
> 0x18(%rbx,%rax,8),%r8
> ---> 27.58 : 40825b: add $0x20,%rbp
>
> 0.00 : 40825f: jmpq 408338 <Main_zdwpolyzugo_info>
> 0.00 : 408264:
> mov $0x7b9609,%ebx
> 0.00 : 408269: add
> $0x20,%rbp
> 0.00 : 40826d: jmpq *0x0(%rbp)
>
> 5.41 : 408270: mov 0xd(%rbx),%rax
>
> How shall I interpret this? Is it really the following jump that is to blame?
It looks like you're noticing "skid" which means the PC returned by the
profiling is a little beyond the problem instruction because it is hard
for the CPU to stop in exactly the right place.
Typically the problem instruction will be a few before, not after. I
wouldn't think an unconditional branch would have much performance impact.
What CPU do you have? What event are you measuring? If you have a new
enough system you can look into using the (I think) ":p" flags to try to
get more precise results, if you aren't already.
Vince
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: 'annotate' confusingly reports 'add' instruction as taking most CPU time
2013-10-01 12:53 ` Vince Weaver
@ 2013-10-01 15:14 ` Johan Tibell
0 siblings, 0 replies; 3+ messages in thread
From: Johan Tibell @ 2013-10-01 15:14 UTC (permalink / raw)
To: Vince Weaver; +Cc: linux-perf-users
Hi Vince,
On Tue, Oct 1, 2013 at 5:53 AM, Vince Weaver <vincent.weaver@maine.edu> wrote:
> It looks like you're noticing "skid" which means the PC returned by the
> profiling is a little beyond the problem instruction because it is hard
> for the CPU to stop in exactly the right place.
>
> Typically the problem instruction will be a few before, not after. I
> wouldn't think an unconditional branch would have much performance impact.
>
> What CPU do you have? What event are you measuring? If you have a new
> enough system you can look into using the (I think) ":p" flags to try to
> get more precise results, if you aren't already.
I found this table of modifiers:
https://perf.wiki.kernel.org/index.php/Tutorial#Modifiers
But it's not obvious to me if they will improve measurement accuracy.
Cheers,
Johan
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2013-10-01 15:14 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-10-01 4:06 'annotate' confusingly reports 'add' instruction as taking most CPU time Johan Tibell
2013-10-01 12:53 ` Vince Weaver
2013-10-01 15:14 ` Johan Tibell
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).