'annotate' confusingly reports 'add' instruction as taking most CPU time

linux-perf-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* 'annotate' confusingly reports 'add' instruction as taking most CPU time
@ 2013-10-01  4:06 Johan Tibell
  2013-10-01 12:53 ` Vince Weaver
  0 siblings, 1 reply; 3+ messages in thread
From: Johan Tibell @ 2013-10-01  4:06 UTC (permalink / raw)
  To: linux-perf-users

Hi,

I'm trying to use 'perf record' to find the hotspot inside a function.
The results I'm seeing are confusing. 'annotate' claims that a simple
'add' instruction is the main CPU time consumer"

        0.00 :          408246:       mov    0x8(%rbp),%rsi
                                                               0.31 :
        40824a:       mov    0x10(%rbp),%rcx
                                              0.00 :          40824e:
     lea    0x4(%rcx),%rdi
                                0.81 :          408252:       mov
0x6(%rbx),%rbx
                  0.00 :          408256:       mov
0x18(%rbx,%rax,8),%r8
     --->   27.58 :          40825b:       add    $0x20,%rbp

0.00 :          40825f:       jmpq   408338 <Main_zdwpolyzugo_info>
                                             0.00 :          408264:
    mov    $0x7b9609,%ebx
                           0.00 :          408269:       add
$0x20,%rbp
                    0.00 :          40826d:       jmpq   *0x0(%rbp)

        5.41 :          408270:       mov    0xd(%rbx),%rax

How shall I interpret this? Is it really the following jump that is to blame?

-- Johan

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: 'annotate' confusingly reports 'add' instruction as taking most CPU time
  2013-10-01  4:06 'annotate' confusingly reports 'add' instruction as taking most CPU time Johan Tibell
@ 2013-10-01 12:53 ` Vince Weaver
  2013-10-01 15:14   ` Johan Tibell
  0 siblings, 1 reply; 3+ messages in thread
From: Vince Weaver @ 2013-10-01 12:53 UTC (permalink / raw)
  To: Johan Tibell; +Cc: linux-perf-users

On Mon, 30 Sep 2013, Johan Tibell wrote:

> I'm trying to use 'perf record' to find the hotspot inside a function.
> The results I'm seeing are confusing. 'annotate' claims that a simple
> 'add' instruction is the main CPU time consumer"
> 
>         0.00 :          408246:       mov    0x8(%rbp),%rsi
>                                                                0.31 :
>         40824a:       mov    0x10(%rbp),%rcx
>                                               0.00 :          40824e:
>      lea    0x4(%rcx),%rdi
>                                 0.81 :          408252:       mov
> 0x6(%rbx),%rbx
>                   0.00 :          408256:       mov
> 0x18(%rbx,%rax,8),%r8
>      --->   27.58 :          40825b:       add    $0x20,%rbp
> 
> 0.00 :          40825f:       jmpq   408338 <Main_zdwpolyzugo_info>
>                                              0.00 :          408264:
>     mov    $0x7b9609,%ebx
>                            0.00 :          408269:       add
> $0x20,%rbp
>                     0.00 :          40826d:       jmpq   *0x0(%rbp)
> 
>         5.41 :          408270:       mov    0xd(%rbx),%rax
> 
> How shall I interpret this? Is it really the following jump that is to blame?

It looks like you're noticing "skid" which means the PC returned by the 
profiling is a little beyond the problem instruction because it is hard 
for the CPU to stop in exactly the right place.

Typically the problem instruction will be a few before, not after.  I
wouldn't think an unconditional branch would have much performance impact.

What CPU do you have?  What event are you measuring?  If you have a new 
enough system you can look into using the (I think) ":p" flags to try to 
get more precise results, if you aren't already.

Vince

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: 'annotate' confusingly reports 'add' instruction as taking most CPU time
  2013-10-01 12:53 ` Vince Weaver
@ 2013-10-01 15:14   ` Johan Tibell
  0 siblings, 0 replies; 3+ messages in thread
From: Johan Tibell @ 2013-10-01 15:14 UTC (permalink / raw)
  To: Vince Weaver; +Cc: linux-perf-users

Hi Vince,

On Tue, Oct 1, 2013 at 5:53 AM, Vince Weaver <vincent.weaver@maine.edu> wrote:
> It looks like you're noticing "skid" which means the PC returned by the
> profiling is a little beyond the problem instruction because it is hard
> for the CPU to stop in exactly the right place.
>
> Typically the problem instruction will be a few before, not after.  I
> wouldn't think an unconditional branch would have much performance impact.
>
> What CPU do you have?  What event are you measuring?  If you have a new
> enough system you can look into using the (I think) ":p" flags to try to
> get more precise results, if you aren't already.

I found this table of modifiers:

    https://perf.wiki.kernel.org/index.php/Tutorial#Modifiers

But it's not obvious to me if they will improve measurement accuracy.

Cheers,
Johan

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2013-10-01 15:14 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-10-01  4:06 'annotate' confusingly reports 'add' instruction as taking most CPU time Johan Tibell
2013-10-01 12:53 ` Vince Weaver
2013-10-01 15:14   ` Johan Tibell

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).