linux-perf-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Milian Wolff <milian.wolff@kdab.com>
To: "Jin, Yao" <yao.jin@linux.intel.com>
Cc: linux-perf-users <linux-perf-users@vger.kernel.org>
Subject: Re: Feedback on: [RESEND PATCH v2 0/5] perf report: Show inline stack
Date: Fri, 13 Jan 2017 09:56:31 +0100	[thread overview]
Message-ID: <11516352.9tecJW0nu1@milian-kdab2> (raw)
In-Reply-To: <953390cf-fb74-12ed-abea-34540ab26363@linux.intel.com>

[-- Attachment #1: Type: text/plain, Size: 11686 bytes --]

On Wednesday, January 11, 2017 9:01:44 AM CET Jin, Yao wrote:
> Hi Wolff,
> 
> See my comments below
> 
> Thanks

Hey Yao, much appreciated!

> On 1/10/2017 11:03 PM, Milian Wolff wrote:
> > On Tuesday, January 10, 2017 9:53:04 AM CET Jin, Yao wrote:
> >> Hi Wolff,
> >> 
> >> I update the patch series with adding inline searching for each
> >> callchain entry.
> > 
> > Hey Yao,
> > 
> > great! Is your git branch publicly accessible somewhere, so I can check it
> > out?
> 
> Sorry, no public git branch yet, but I will send my patch series to you
> in another mail thread.

Perfect, I'll apply and test it later today, I think.

> >> Now the result tested on my skl machine is:
> >> 
> >> # Overhead  Command  Shared Object        Symbol
> >> # ........  .......  ...................
> >> ..................................
> >> #
> >> 
> >>       56.76%  inline   inline               [.] main
> >>       
> >>               ---main
> >>               
> >>                  __libc_start_main
> >>                  _start
> > 
> > This is still wrong, no? Tons of stuff gets inlined into main and it seems
> > as if your code is still not capable of displaying what that is. Please
> > refer to the output I pasted from my patch in the previous mail. I'm
> > explicitly looking for the cost of using <random> e.g.
> 
> That might be not an issue. I have checked rip of each callchain entry
> and use "addr2line -e <binary> -i <rip>" to verify if it's in inline
> function, same result.

Ah, this may actually be the following issue:

https://lists.fedorahosted.org/archives/list/elfutils-devel@lists.fedorahosted.org/thread/X5XUOW4DQ6DNE5SL4QPG3MGGQPJUVSCW/
#G2L4JBSBRDZDZRPL6MF46SSPER5UME2G

I'll test and see if addr - 1 produces better inlining information, I have 
applied this workaround in my branch. But, of course, we'll need to fix this 
properly, based on the approach that elfutils is taking.

> There is probably different rip getting sampled or different sortkey
> used. The previous example is:
> perf report --no-children --stdio --inline
> 
> If I use the perf report --children --stdio --inline, the result is
> 
> # Children      Self  Command  Shared Object        Symbol
> # ........  ........  .......  ...................
> ..................................
> #
>      98.15%    56.76%  inline   inline               [.] main
> 
>              |--56.76%--_start
>              |
>              |          __libc_start_main
>              |          main
> 
>                         ---/usr/include/c++/5/bits/random.h:151 (inline)
>                            /usr/include/c++/5/bits/random.h:332 (inline)
>                            /usr/include/c++/5/bits/random.tcc:3328 (inline)
>                            /usr/include/c++/5/bits/random.h:185 (inline)
>                            /usr/include/c++/5/bits/random.h:1818 (inline)
>                            /usr/include/c++/5/bits/random.h:1809 (inline)
> /home/jinyao/skl-ws/perf-dev/lck-2867/test/inline.cpp:14 (inline)
>               --41.39%--main
>                         __hypot
> 
>                          --40.09%--__hypot_finite
> 
> Anyway, you can try my latest patch series and check what happens on
> your machine.

I will, thanks a lot again! Much appreciated.

Cheers

> >>       40.54%  inline   libm-2.23.so         [.] __hypot_finite
> >>       
> >>               ---__hypot_finite
> >>               
> >>                   --40.09%--__hypot
> >>                   
> >>                             ---/usr/include/c++/5/bits/random.h:151
> >>                             (inline)
> >>                             
> >>                                /usr/include/c++/5/bits/random.h:332
> >>                                (inline)
> >> 
> >> /usr/include/c++/5/bits/random.tcc:3328 (inline)
> >> 
> >>                                /usr/include/c++/5/bits/random.h:185
> >>                                (inline)
> >>                                /usr/include/c++/5/bits/random.h:1818
> >>                                (inline)
> >> 
> >> /usr/include/c++/5/bits/random.h:1809 (inline)
> >> /home/jinyao/skl-ws/perf-dev/lck-2867/test/inline.cpp:14 (inline)
> >> 
> >>                             main
> >>                             __libc_start_main
> >>                             _start
> > 
> > Much better, great work! But again I have to ask the same question:
> > 
> > Why not integrate directly with the other call stack nodes? This would
> > solve the following problems:
> > 
> > - correct display of children cost for inlined frames, i.e. in a `perf
> > report --children --inline`, I suspect your branch won't actually show
> > any top-level entries for inline.cpp:14, correct?
> > 
> > - display/aggregation: currently you are always displaying srcline in your
> > approach, what if I want to display/aggregate by function symbol instead?
> 
> I just follow the behavior of addr2line. If I use the command line
> "addr2line -e <binary> -i <rip>", it just display the srcline on my
> machine.
> 
> > So, could we please discuss this please?
> > 
> > Thanks a lot
> > 
> >> On 1/9/2017 4:41 PM, Milian Wolff wrote:
> >>> On Monday, January 9, 2017 4:17:48 PM CET Jin, Yao wrote:
> >>>> Hi Wolff,
> >>>> 
> >>>> Thanks a lot for your test code!
> >>>> 
> >>>> I have updated my patch. The new output for your test code (binary name
> >>>> is inline) is:
> >>>> 
> >>>> # Overhead  Command  Shared Object        Symbol
> >>>> # ........  .......  ...................
> >>>> ..................................
> >>>> #
> >>>> 
> >>>>        56.76%  inline   inline               [.] main
> >>>>        
> >>>>                ---main
> >>>>                
> >>>>                   __libc_start_main
> >>>>                   _start
> >>>>        
> >>>>        40.54%  inline   libm-2.23.so         [.] __hypot_finite
> >>>>        
> >>>>                ---__hypot_finite
> >>>>                
> >>>>                    --40.09%--__hypot
> >>>>                    
> >>>>                              main
> >>>>                              __libc_start_main
> >>>>                              _start
> >>>>         
> >>>>         1.30%  inline   libm-2.23.so         [.] __hypot
> >>>>         
> >>>>                ---__hypot
> >>>>                
> >>>>                   main
> >>>>                   
> >>>>                   ---/usr/include/c++/5/bits/random.h:151 (inline)
> >>>>                   
> >>>>                      /usr/include/c++/5/bits/random.h:332 (inline)
> >>>>                      /usr/include/c++/5/bits/random.tcc:3328 (inline)
> >>>>                      /usr/include/c++/5/bits/random.h:185 (inline)
> >>>>                      /usr/include/c++/5/bits/random.h:1818 (inline)
> >>>>                      /usr/include/c++/5/bits/random.h:1809 (inline)
> >>>> 
> >>>> /home/jinyao/skl-ws/perf-dev/lck-2867/test/inline.cpp:14 (inline)
> >>>> 
> >>>>                   __libc_start_main
> >>>>                   _start
> >>>> 
> >>>> The inline entry is tagged with "(inline)". Is this result within
> >>>> expectation?
> >>> 
> >>> No, this is not the expected output. I expect to get information about
> >>> the
> >>> inlined frames in the middle of the call stack, in this example that
> >>> happens for the first two major hotspots.
> >>> 
> >>> Below is the output from my approach to this problem using my -g
> >>> srcline:
> >>> 
> >>> ~~~~~~~~~~~~~~ pasted here for readability:
> >>> https://paste.kde.org/pmq3bc1le
> >>> Samples: 149  of event 'cycles:ppp', Event count (approx.): 141334499
> >>> 
> >>>     Overhead  Command  Shared Object     Symbol
> >>> 
> >>> -   62.24%  inline   inline            [.] main
> >>> 
> >>>      - 27.98% std::generate_canonical<double, 53ul,
> >>> 
> >>> std::linear_congruential_engine<unsigned long, 16807ul, 0ul,
> >>> 2147483647ul>
> >>> 
> >>>> random.tcc:3332
> >>>> 
> >>>           std::__detail::_Adaptor<std::linear_congruential_engine<unsign
> >>>           ed
> >>>           long,
> >>> 
> >>> 16807ul, 0ul, 2147483647ul>, double>::operator() random.h:185
> >>> 
> >>>           std::uniform_real_distribution<double>::operator()<std::linear
> >>>           _co
> >>>           ngruential_engine<unsigned>
> >>> 
> >>> long, 16807ul, 0ul, 2147483647ul> > random.h:1818
> >>> 
> >>>           std::uniform_real_distribution<double>::operator()<std::linear
> >>>           _co
> >>>           ngruential_engine<unsigned>
> >>> 
> >>> long, 16807ul, 0ul, 2147483647ul> > random.h:1809
> >>> 
> >>>           main ine.cpp:14
> >>>           __libc_start_main +241
> >>>           _start +4194346
> >>>      
> >>>      - 25.88% std::__detail::_Mod<unsigned long, 2147483647ul, 16807ul,
> >>>      0ul,
> >>> 
> >>> true, true>::__calc random.h:143
> >>> 
> >>>           std::__detail::__mod<unsigned long, 2147483647ul, 16807ul,
> >>>           0ul>
> >>> 
> >>> random.h:151
> >>> 
> >>>           std::linear_congruential_engine<unsigned long, 16807ul, 0ul,
> >>> 
> >>> 2147483647ul>::operator() random.h:332
> >>> 
> >>>           std::generate_canonical<double, 53ul,
> >>> 
> >>> std::linear_congruential_engine<unsigned long, 16807ul, 0ul,
> >>> 2147483647ul>
> >>> 
> >>>> random.tcc:3332
> >>>> 
> >>>           std::__detail::_Adaptor<std::linear_congruential_engine<unsign
> >>>           ed
> >>>           long,
> >>> 
> >>> 16807ul, 0ul, 2147483647ul>, double>::operator() random.h:185
> >>> 
> >>>           std::uniform_real_distribution<double>::operator()<std::linear
> >>>           _co
> >>>           ngruential_engine<unsigned>
> >>> 
> >>> long, 16807ul, 0ul, 2147483647ul> > random.h:1818
> >>> 
> >>>           std::uniform_real_distribution<double>::operator()<std::linear
> >>>           _co
> >>>           ngruential_engine<unsigned>
> >>> 
> >>> long, 16807ul, 0ul, 2147483647ul> > random.h:1809
> >>> 
> >>>           main ine.cpp:14
> >>>           __libc_start_main +241
> >>>           _start +4194346
> >>>      
> >>>      + 4.14% main inline.cpp:13
> >>>      + 2.89%
> >>> 
> >>> std::uniform_real_distribution<double>::operator()<std::linear_congruent
> >>> ia
> >>> l_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > random.h:1818
> >>> 
> >>>      + 0.68% std::_Norm_helper<true>::_S_do_it<double> complex:655
> >>>      + 0.67% std::generate_canonical<double, 53ul,
> >>> 
> >>> std::linear_congruential_engine<unsigned long, 16807ul, 0ul,
> >>> 2147483647ul>
> >>> 
> >>>> random.tcc:3326
> >>> 
> >>> -   26.24%  inline   libm-2.24.so      [.] __hypot_finite
> >>> 
> >>>      - __hypot_finite +179
> >>>      
> >>>         - 25.56% hypot +20
> >>>         
> >>>              std::__complex_abs complex:589
> >>>              std::abs<double> complex:597
> >>>              std::_Norm_helper<true>::_S_do_it<double> complex:654
> >>>              std::norm<double> complex:664
> >>>              main ine.cpp:14
> >>>              __libc_start_main +241
> >>>              _start +4194346
> >>>           
> >>>           0.68% 0x40ead238ed1a2cf4
> >>> 
> >>> -    6.85%  inline   libm-2.24.so      [.] hypot
> >>> 
> >>>        hypot +40
> >>>        std::__complex_abs complex:589
> >>>        std::abs<double> complex:597
> >>>        std::_Norm_helper<true>::_S_do_it<double> complex:654
> >>>        std::norm<double> complex:664
> >>>        main ine.cpp:14
> >>>        __libc_start_main +241
> >>>        _start +4194346
> >>> 
> >>> ~~~~~~~~~~~~~~
> >>> 
> >>> I hope you see the value of such a report versus what you are proposing.
> >>> 
> >>> Thanks


-- 
Milian Wolff | milian.wolff@kdab.com | Software Engineer
KDAB (Deutschland) GmbH&Co KG, a KDAB Group company
Tel: +49-30-521325470
KDAB - The Qt Experts

[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5903 bytes --]

      reply	other threads:[~2017-01-13  8:56 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-01-07 20:13 Feedback on: [RESEND PATCH v2 0/5] perf report: Show inline stack Milian Wolff
2017-01-09  3:07 ` Jin, Yao
2017-01-09  8:02   ` Milian Wolff
2017-01-09  8:17     ` Jin, Yao
2017-01-09  8:41       ` Milian Wolff
2017-01-10  1:53         ` Jin, Yao
2017-01-10 15:03           ` Milian Wolff
2017-01-11  1:01             ` Jin, Yao
2017-01-13  8:56               ` Milian Wolff [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=11516352.9tecJW0nu1@milian-kdab2 \
    --to=milian.wolff@kdab.com \
    --cc=linux-perf-users@vger.kernel.org \
    --cc=yao.jin@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).