linux-perf-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Feedback on: [RESEND PATCH v2 0/5] perf report: Show inline stack
@ 2017-01-07 20:13 Milian Wolff
  2017-01-09  3:07 ` Jin, Yao
  0 siblings, 1 reply; 9+ messages in thread
From: Milian Wolff @ 2017-01-07 20:13 UTC (permalink / raw)
  To: Jin, Yao; +Cc: linux-perf-users

Hey Jin,

I have finally had the time to test your patch series. Sadly, I hit a 
fundamental issue with it early on in my testing:

In my test case, I get with `perf report --no-children` e.g. the following:

    42.50%  lab_mandelbrot   libm-2.24.so                [.] __hypot_finite
            |
            ---__hypot_finite
               hypot
               main
               __libc_start_main
               _start

Enabling --inline does not change the output, even though hypot is inlined 
into main.

Looking at your code, it looks to me as if you are only trying to find 
inliners for root entries, cf. hist_entry__fprintf (stdio/hist.c), and only 
when unwinding failed (?, cf. (callchain_ret == 0)).

Is this understanding correct, or should your patch also add inliners for 
nodes in the middle of the callstack?

This approach makes the whole feature unusable for my use case. In my (really 
hacky) approach to adding inliners to the output of perf (https://github.com/
milianw/linux/commit/71d031c9d679bfb4a4044226e8903dd80ea601b3) I decided to 
lookup inliners for _every_ callchain entry. The result then there is with my 
branch and a `perf report -g srcline --stdio`:

    42.50%  lab_mandelbrot   libm-2.24.so                [.] __hypot_finite
            |
            ---__hypot_finite +387
               hypot +20
               std::__complex_abs complex:589
               std::abs<double> complex:597
               drawMandelbrot mandelbrot.h:39
               main main.cpp:55
               __libc_start_main +241
               _start +42

Note the three inliner entries in the middle of the callstack. The way I did 
it, one will also see "proper" entries for inlined functions in a top-down 
`perf report --children` output.

In general, your patch series looks much more polished than mine, so I would 
love to get it integrated. But with your approach, we will need lots of 
changes to the various browsers to get all features in. My hacky approach has 
the nice property that it simply adds "normal" callchain entries for inlined 
nodes - maybe you should do the same?

Thanks a lot for your work, much appreciated.

-- 
Milian Wolff | milian.wolff@kdab.com | Software Engineer
KDAB (Deutschland) GmbH&Co KG, a KDAB Group company
Tel: +49-30-521325470
KDAB - The Qt Experts

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Feedback on: [RESEND PATCH v2 0/5] perf report: Show inline stack
  2017-01-07 20:13 Feedback on: [RESEND PATCH v2 0/5] perf report: Show inline stack Milian Wolff
@ 2017-01-09  3:07 ` Jin, Yao
  2017-01-09  8:02   ` Milian Wolff
  0 siblings, 1 reply; 9+ messages in thread
From: Jin, Yao @ 2017-01-09  3:07 UTC (permalink / raw)
  To: Milian Wolff; +Cc: linux-perf-users

Hi Wolff,

Thanks a lot for your feedback.

Have you tested with the browser mode? I mean without the "--stdio" option.

Could you give your test binary then I can try to reproduce the issue? I 
have tried with my tests but I could not reproduce.

Thanks

Jin Yao

On 1/8/2017 4:13 AM, Milian Wolff wrote:
> Hey Jin,
>
> I have finally had the time to test your patch series. Sadly, I hit a
> fundamental issue with it early on in my testing:
>
> In my test case, I get with `perf report --no-children` e.g. the following:
>
>      42.50%  lab_mandelbrot   libm-2.24.so                [.] __hypot_finite
>              |
>              ---__hypot_finite
>                 hypot
>                 main
>                 __libc_start_main
>                 _start
>
> Enabling --inline does not change the output, even though hypot is inlined
> into main.
>
> Looking at your code, it looks to me as if you are only trying to find
> inliners for root entries, cf. hist_entry__fprintf (stdio/hist.c), and only
> when unwinding failed (?, cf. (callchain_ret == 0)).
>
> Is this understanding correct, or should your patch also add inliners for
> nodes in the middle of the callstack?
>
> This approach makes the whole feature unusable for my use case. In my (really
> hacky) approach to adding inliners to the output of perf (https://github.com/
> milianw/linux/commit/71d031c9d679bfb4a4044226e8903dd80ea601b3) I decided to
> lookup inliners for _every_ callchain entry. The result then there is with my
> branch and a `perf report -g srcline --stdio`:
>
>      42.50%  lab_mandelbrot   libm-2.24.so                [.] __hypot_finite
>              |
>              ---__hypot_finite +387
>                 hypot +20
>                 std::__complex_abs complex:589
>                 std::abs<double> complex:597
>                 drawMandelbrot mandelbrot.h:39
>                 main main.cpp:55
>                 __libc_start_main +241
>                 _start +42
>
> Note the three inliner entries in the middle of the callstack. The way I did
> it, one will also see "proper" entries for inlined functions in a top-down
> `perf report --children` output.
>
> In general, your patch series looks much more polished than mine, so I would
> love to get it integrated. But with your approach, we will need lots of
> changes to the various browsers to get all features in. My hacky approach has
> the nice property that it simply adds "normal" callchain entries for inlined
> nodes - maybe you should do the same?
>
> Thanks a lot for your work, much appreciated.
>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Feedback on: [RESEND PATCH v2 0/5] perf report: Show inline stack
  2017-01-09  3:07 ` Jin, Yao
@ 2017-01-09  8:02   ` Milian Wolff
  2017-01-09  8:17     ` Jin, Yao
  0 siblings, 1 reply; 9+ messages in thread
From: Milian Wolff @ 2017-01-09  8:02 UTC (permalink / raw)
  To: Jin, Yao; +Cc: linux-perf-users

[-- Attachment #1: Type: text/plain, Size: 6725 bytes --]

On Monday, January 9, 2017 11:07:26 AM CET Jin, Yao wrote:
> Hi Wolff,
> 
> Thanks a lot for your feedback.
> 
> Have you tested with the browser mode? I mean without the "--stdio" option.

Yes, I have done that:

~~~~~~~~~~~~
milian@milian-kdab2:/tmp$ g++ -O2 -g inline.cpp -o inline
milian@milian-kdab2:/tmp$ perf record --call-graph dwarf ./inline
6.66016e+15
[ perf record: Woken up 5 times to write data ]
[ perf record: Captured and wrote 1.215 MB perf.data (149 samples) ]
~~~~~~~~~~~~

Both with and without your patch (and passing --inline) I get:

~~~~~~~~~~~~
milian@milian-kdab2:/tmp$ perf report --no-children --inline
Samples: 149  of event 'cycles:ppp', Event count (approx.): 141334499
  Overhead  Command  Shared Object     Symbol                                                                                                                                                                   
-   62.24%  inline   inline            [.] main                                                                                                                                                                 
     main                                                                                                                                                                                                       
     __libc_start_main                                                                                                                                                                                          
     _start                                                                                                                                                                                                     
-   26.24%  inline   libm-2.24.so      [.] __hypot_finite                                                                                                                                                       
   - __hypot_finite                                                                                                                                                                                             
      - 25.56% hypot                                                                                                                                                                                            
           main                                                                                                                                                                                                 
           __libc_start_main                                                                                                                                                                                    
           _start                                                                                                                                                                                               
        0.68% 0x40ead238ed1a2cf4                                                                                                                                                                                
-    6.85%  inline   libm-2.24.so      [.] hypot
     hypot
     main
     __libc_start_main                                                                                                                                                                                          
     _start
~~~~~~~~~~~~

> Could you give your test binary then I can try to reproduce the issue? I
> have tried with my tests but I could not reproduce.

The following code should reproduce the issue:

~~~~~~~~~~~~
#include <complex>
#include <cmath>
#include <random>
#include <iostream>

using namespace std;

int main()
{
    uniform_real_distribution<double> uniform(-1E5, 1E5);
    default_random_engine engine;
    double s = 0;
    for (int i = 0; i < 1000000; ++i) {
        s += norm(complex<double>(uniform(engine), uniform(engine)));
    }
    cout << s << '\n';
    return 0;
}
~~~~~~~~~~~~

Hope this helps

> On 1/8/2017 4:13 AM, Milian Wolff wrote:
> > Hey Jin,
> > 
> > I have finally had the time to test your patch series. Sadly, I hit a
> > fundamental issue with it early on in my testing:
> > 
> > In my test case, I get with `perf report --no-children` e.g. the 
following:
> >      42.50%  lab_mandelbrot   libm-2.24.so                [.]
> >      __hypot_finite
> >      
> >              ---__hypot_finite
> >              
> >                 hypot
> >                 main
> >                 __libc_start_main
> >                 _start
> > 
> > Enabling --inline does not change the output, even though hypot is inlined
> > into main.
> > 
> > Looking at your code, it looks to me as if you are only trying to find
> > inliners for root entries, cf. hist_entry__fprintf (stdio/hist.c), and
> > only
> > when unwinding failed (?, cf. (callchain_ret == 0)).
> > 
> > Is this understanding correct, or should your patch also add inliners for
> > nodes in the middle of the callstack?
> > 
> > This approach makes the whole feature unusable for my use case. In my
> > (really hacky) approach to adding inliners to the output of perf
> > (https://github.com/
> > milianw/linux/commit/71d031c9d679bfb4a4044226e8903dd80ea601b3) I decided
> > to lookup inliners for _every_ callchain entry. The result then there is
> > with my> 
> > branch and a `perf report -g srcline --stdio`:
> >      42.50%  lab_mandelbrot   libm-2.24.so                [.]
> >      __hypot_finite
> >      
> >              ---__hypot_finite +387
> >              
> >                 hypot +20
> >                 std::__complex_abs complex:589
> >                 std::abs<double> complex:597
> >                 drawMandelbrot mandelbrot.h:39
> >                 main main.cpp:55
> >                 __libc_start_main +241
> >                 _start +42
> > 
> > Note the three inliner entries in the middle of the callstack. The way I
> > did it, one will also see "proper" entries for inlined functions in a
> > top-down `perf report --children` output.
> > 
> > In general, your patch series looks much more polished than mine, so I
> > would love to get it integrated. But with your approach, we will need
> > lots of changes to the various browsers to get all features in. My hacky
> > approach has the nice property that it simply adds "normal" callchain
> > entries for inlined nodes - maybe you should do the same?
> > 
> > Thanks a lot for your work, much appreciated.


-- 
Milian Wolff | milian.wolff@kdab.com | Software Engineer
KDAB (Deutschland) GmbH&Co KG, a KDAB Group company
Tel: +49-30-521325470
KDAB - The Qt Experts

[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5903 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Feedback on: [RESEND PATCH v2 0/5] perf report: Show inline stack
  2017-01-09  8:02   ` Milian Wolff
@ 2017-01-09  8:17     ` Jin, Yao
  2017-01-09  8:41       ` Milian Wolff
  0 siblings, 1 reply; 9+ messages in thread
From: Jin, Yao @ 2017-01-09  8:17 UTC (permalink / raw)
  To: Milian Wolff; +Cc: linux-perf-users

Hi Wolff,

Thanks a lot for your test code!

I have updated my patch. The new output for your test code (binary name 
is inline) is:

# Overhead  Command  Shared Object        Symbol
# ........  .......  ................... ..................................
#
     56.76%  inline   inline               [.] main
             |
             ---main
                __libc_start_main
                _start

     40.54%  inline   libm-2.23.so         [.] __hypot_finite
             |
             ---__hypot_finite
                |
                 --40.09%--__hypot
                           main
                           __libc_start_main
                           _start

      1.30%  inline   libm-2.23.so         [.] __hypot
             |
             ---__hypot
                main
                |
                ---/usr/include/c++/5/bits/random.h:151 (inline)
                   /usr/include/c++/5/bits/random.h:332 (inline)
                   /usr/include/c++/5/bits/random.tcc:3328 (inline)
                   /usr/include/c++/5/bits/random.h:185 (inline)
                   /usr/include/c++/5/bits/random.h:1818 (inline)
                   /usr/include/c++/5/bits/random.h:1809 (inline)
/home/jinyao/skl-ws/perf-dev/lck-2867/test/inline.cpp:14 (inline)
                __libc_start_main
                _start

The inline entry is tagged with "(inline)". Is this result within 
expectation?

Thanks
Jin Yao

On 1/9/2017 4:02 PM, Milian Wolff wrote:
> On Monday, January 9, 2017 11:07:26 AM CET Jin, Yao wrote:
>> Hi Wolff,
>>
>> Thanks a lot for your feedback.
>>
>> Have you tested with the browser mode? I mean without the "--stdio" option.
> Yes, I have done that:
>
> ~~~~~~~~~~~~
> milian@milian-kdab2:/tmp$ g++ -O2 -g inline.cpp -o inline
> milian@milian-kdab2:/tmp$ perf record --call-graph dwarf ./inline
> 6.66016e+15
> [ perf record: Woken up 5 times to write data ]
> [ perf record: Captured and wrote 1.215 MB perf.data (149 samples) ]
> ~~~~~~~~~~~~
>
> Both with and without your patch (and passing --inline) I get:
>
> ~~~~~~~~~~~~
> milian@milian-kdab2:/tmp$ perf report --no-children --inline
> Samples: 149  of event 'cycles:ppp', Event count (approx.): 141334499
>    Overhead  Command  Shared Object     Symbol
> -   62.24%  inline   inline            [.] main
>       main
>       __libc_start_main
>       _start
> -   26.24%  inline   libm-2.24.so      [.] __hypot_finite
>     - __hypot_finite
>        - 25.56% hypot
>             main
>             __libc_start_main
>             _start
>          0.68% 0x40ead238ed1a2cf4
> -    6.85%  inline   libm-2.24.so      [.] hypot
>       hypot
>       main
>       __libc_start_main
>       _start
> ~~~~~~~~~~~~
>
>> Could you give your test binary then I can try to reproduce the issue? I
>> have tried with my tests but I could not reproduce.
> The following code should reproduce the issue:
>
> ~~~~~~~~~~~~
> #include <complex>
> #include <cmath>
> #include <random>
> #include <iostream>
>
> using namespace std;
>
> int main()
> {
>      uniform_real_distribution<double> uniform(-1E5, 1E5);
>      default_random_engine engine;
>      double s = 0;
>      for (int i = 0; i < 1000000; ++i) {
>          s += norm(complex<double>(uniform(engine), uniform(engine)));
>      }
>      cout << s << '\n';
>      return 0;
> }
> ~~~~~~~~~~~~
>
> Hope this helps
>
>> On 1/8/2017 4:13 AM, Milian Wolff wrote:
>>> Hey Jin,
>>>
>>> I have finally had the time to test your patch series. Sadly, I hit a
>>> fundamental issue with it early on in my testing:
>>>
>>> In my test case, I get with `perf report --no-children` e.g. the
> following:
>>>       42.50%  lab_mandelbrot   libm-2.24.so                [.]
>>>       __hypot_finite
>>>       
>>>               ---__hypot_finite
>>>               
>>>                  hypot
>>>                  main
>>>                  __libc_start_main
>>>                  _start
>>>
>>> Enabling --inline does not change the output, even though hypot is inlined
>>> into main.
>>>
>>> Looking at your code, it looks to me as if you are only trying to find
>>> inliners for root entries, cf. hist_entry__fprintf (stdio/hist.c), and
>>> only
>>> when unwinding failed (?, cf. (callchain_ret == 0)).
>>>
>>> Is this understanding correct, or should your patch also add inliners for
>>> nodes in the middle of the callstack?
>>>
>>> This approach makes the whole feature unusable for my use case. In my
>>> (really hacky) approach to adding inliners to the output of perf
>>> (https://github.com/
>>> milianw/linux/commit/71d031c9d679bfb4a4044226e8903dd80ea601b3) I decided
>>> to lookup inliners for _every_ callchain entry. The result then there is
>>> with my>
>>> branch and a `perf report -g srcline --stdio`:
>>>       42.50%  lab_mandelbrot   libm-2.24.so                [.]
>>>       __hypot_finite
>>>       
>>>               ---__hypot_finite +387
>>>               
>>>                  hypot +20
>>>                  std::__complex_abs complex:589
>>>                  std::abs<double> complex:597
>>>                  drawMandelbrot mandelbrot.h:39
>>>                  main main.cpp:55
>>>                  __libc_start_main +241
>>>                  _start +42
>>>
>>> Note the three inliner entries in the middle of the callstack. The way I
>>> did it, one will also see "proper" entries for inlined functions in a
>>> top-down `perf report --children` output.
>>>
>>> In general, your patch series looks much more polished than mine, so I
>>> would love to get it integrated. But with your approach, we will need
>>> lots of changes to the various browsers to get all features in. My hacky
>>> approach has the nice property that it simply adds "normal" callchain
>>> entries for inlined nodes - maybe you should do the same?
>>>
>>> Thanks a lot for your work, much appreciated.
>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Feedback on: [RESEND PATCH v2 0/5] perf report: Show inline stack
  2017-01-09  8:17     ` Jin, Yao
@ 2017-01-09  8:41       ` Milian Wolff
  2017-01-10  1:53         ` Jin, Yao
  0 siblings, 1 reply; 9+ messages in thread
From: Milian Wolff @ 2017-01-09  8:41 UTC (permalink / raw)
  To: Jin, Yao; +Cc: linux-perf-users

[-- Attachment #1: Type: text/plain, Size: 4980 bytes --]

On Monday, January 9, 2017 4:17:48 PM CET Jin, Yao wrote:
> Hi Wolff,
> 
> Thanks a lot for your test code!
> 
> I have updated my patch. The new output for your test code (binary name
> is inline) is:
> 
> # Overhead  Command  Shared Object        Symbol
> # ........  .......  ................... ..................................
> #
>      56.76%  inline   inline               [.] main
> 
>              ---main
>                 __libc_start_main
>                 _start
> 
>      40.54%  inline   libm-2.23.so         [.] __hypot_finite
> 
>              ---__hypot_finite
> 
>                  --40.09%--__hypot
>                            main
>                            __libc_start_main
>                            _start
> 
>       1.30%  inline   libm-2.23.so         [.] __hypot
> 
>              ---__hypot
>                 main
> 
>                 ---/usr/include/c++/5/bits/random.h:151 (inline)
>                    /usr/include/c++/5/bits/random.h:332 (inline)
>                    /usr/include/c++/5/bits/random.tcc:3328 (inline)
>                    /usr/include/c++/5/bits/random.h:185 (inline)
>                    /usr/include/c++/5/bits/random.h:1818 (inline)
>                    /usr/include/c++/5/bits/random.h:1809 (inline)
> /home/jinyao/skl-ws/perf-dev/lck-2867/test/inline.cpp:14 (inline)
>                 __libc_start_main
>                 _start
> 
> The inline entry is tagged with "(inline)". Is this result within
> expectation?

No, this is not the expected output. I expect to get information about the 
inlined frames in the middle of the call stack, in this example that happens 
for the first two major hotspots.

Below is the output from my approach to this problem using my -g srcline:

~~~~~~~~~~~~~~ pasted here for readability: https://paste.kde.org/pmq3bc1le
Samples: 149  of event 'cycles:ppp', Event count (approx.): 141334499
  Overhead  Command  Shared Object     Symbol
-   62.24%  inline   inline            [.] main
   - 27.98% std::generate_canonical<double, 53ul, 
std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > 
random.tcc:3332
        std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long, 
16807ul, 0ul, 2147483647ul>, double>::operator() random.h:185
        std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned 
long, 16807ul, 0ul, 2147483647ul> > random.h:1818
        std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned 
long, 16807ul, 0ul, 2147483647ul> > random.h:1809
        main ine.cpp:14
        __libc_start_main +241
        _start +4194346
   - 25.88% std::__detail::_Mod<unsigned long, 2147483647ul, 16807ul, 0ul, 
true, true>::__calc random.h:143
        std::__detail::__mod<unsigned long, 2147483647ul, 16807ul, 0ul> 
random.h:151
        std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 
2147483647ul>::operator() random.h:332
        std::generate_canonical<double, 53ul, 
std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > 
random.tcc:3332
        std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long, 
16807ul, 0ul, 2147483647ul>, double>::operator() random.h:185
        std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned 
long, 16807ul, 0ul, 2147483647ul> > random.h:1818
        std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned 
long, 16807ul, 0ul, 2147483647ul> > random.h:1809
        main ine.cpp:14
        __libc_start_main +241
        _start +4194346
   + 4.14% main inline.cpp:13
   + 2.89% 
std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned 
long, 16807ul, 0ul, 2147483647ul> > random.h:1818
   + 0.68% std::_Norm_helper<true>::_S_do_it<double> complex:655
   + 0.67% std::generate_canonical<double, 53ul, 
std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > 
random.tcc:3326
-   26.24%  inline   libm-2.24.so      [.] __hypot_finite
   - __hypot_finite +179
      - 25.56% hypot +20
           std::__complex_abs complex:589
           std::abs<double> complex:597
           std::_Norm_helper<true>::_S_do_it<double> complex:654
           std::norm<double> complex:664
           main ine.cpp:14
           __libc_start_main +241
           _start +4194346
        0.68% 0x40ead238ed1a2cf4
-    6.85%  inline   libm-2.24.so      [.] hypot
     hypot +40
     std::__complex_abs complex:589
     std::abs<double> complex:597
     std::_Norm_helper<true>::_S_do_it<double> complex:654
     std::norm<double> complex:664
     main ine.cpp:14
     __libc_start_main +241
     _start +4194346
~~~~~~~~~~~~~~

I hope you see the value of such a report versus what you are proposing.

Thanks

-- 
Milian Wolff | milian.wolff@kdab.com | Software Engineer
KDAB (Deutschland) GmbH&Co KG, a KDAB Group company
Tel: +49-30-521325470
KDAB - The Qt Experts

[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5903 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Feedback on: [RESEND PATCH v2 0/5] perf report: Show inline stack
  2017-01-09  8:41       ` Milian Wolff
@ 2017-01-10  1:53         ` Jin, Yao
  2017-01-10 15:03           ` Milian Wolff
  0 siblings, 1 reply; 9+ messages in thread
From: Jin, Yao @ 2017-01-10  1:53 UTC (permalink / raw)
  To: Milian Wolff; +Cc: linux-perf-users

Hi Wolff,

I update the patch series with adding inline searching for each 
callchain entry.

Now the result tested on my skl machine is:

# Overhead  Command  Shared Object        Symbol
# ........  .......  ................... ..................................
#
     56.76%  inline   inline               [.] main
             |
             ---main
                __libc_start_main
                _start

     40.54%  inline   libm-2.23.so         [.] __hypot_finite
             |
             ---__hypot_finite
                |
                 --40.09%--__hypot
                           |
                           ---/usr/include/c++/5/bits/random.h:151 (inline)
                              /usr/include/c++/5/bits/random.h:332 (inline)
/usr/include/c++/5/bits/random.tcc:3328 (inline)
                              /usr/include/c++/5/bits/random.h:185 (inline)
                              /usr/include/c++/5/bits/random.h:1818 (inline)
                              /usr/include/c++/5/bits/random.h:1809 (inline)
/home/jinyao/skl-ws/perf-dev/lck-2867/test/inline.cpp:14 (inline)
                           main
                           __libc_start_main
                           _start

      1.30%  inline   libm-2.23.so         [.] __hypot
             |
             ---__hypot
                |
                ---/usr/include/c++/5/bits/random.h:151 (inline)
                   /usr/include/c++/5/bits/random.h:332 (inline)
                   /usr/include/c++/5/bits/random.tcc:3328 (inline)
                   /usr/include/c++/5/bits/random.h:185 (inline)
                   /usr/include/c++/5/bits/random.h:1818 (inline)
                   /usr/include/c++/5/bits/random.h:1809 (inline)
/home/jinyao/skl-ws/perf-dev/lck-2867/test/inline.cpp:14 (inline)
                main
                __libc_start_main
                _start

Thanks

Jin Yao

On 1/9/2017 4:41 PM, Milian Wolff wrote:
> On Monday, January 9, 2017 4:17:48 PM CET Jin, Yao wrote:
>> Hi Wolff,
>>
>> Thanks a lot for your test code!
>>
>> I have updated my patch. The new output for your test code (binary name
>> is inline) is:
>>
>> # Overhead  Command  Shared Object        Symbol
>> # ........  .......  ................... ..................................
>> #
>>       56.76%  inline   inline               [.] main
>>
>>               ---main
>>                  __libc_start_main
>>                  _start
>>
>>       40.54%  inline   libm-2.23.so         [.] __hypot_finite
>>
>>               ---__hypot_finite
>>
>>                   --40.09%--__hypot
>>                             main
>>                             __libc_start_main
>>                             _start
>>
>>        1.30%  inline   libm-2.23.so         [.] __hypot
>>
>>               ---__hypot
>>                  main
>>
>>                  ---/usr/include/c++/5/bits/random.h:151 (inline)
>>                     /usr/include/c++/5/bits/random.h:332 (inline)
>>                     /usr/include/c++/5/bits/random.tcc:3328 (inline)
>>                     /usr/include/c++/5/bits/random.h:185 (inline)
>>                     /usr/include/c++/5/bits/random.h:1818 (inline)
>>                     /usr/include/c++/5/bits/random.h:1809 (inline)
>> /home/jinyao/skl-ws/perf-dev/lck-2867/test/inline.cpp:14 (inline)
>>                  __libc_start_main
>>                  _start
>>
>> The inline entry is tagged with "(inline)". Is this result within
>> expectation?
> No, this is not the expected output. I expect to get information about the
> inlined frames in the middle of the call stack, in this example that happens
> for the first two major hotspots.
>
> Below is the output from my approach to this problem using my -g srcline:
>
> ~~~~~~~~~~~~~~ pasted here for readability: https://paste.kde.org/pmq3bc1le
> Samples: 149  of event 'cycles:ppp', Event count (approx.): 141334499
>    Overhead  Command  Shared Object     Symbol
> -   62.24%  inline   inline            [.] main
>     - 27.98% std::generate_canonical<double, 53ul,
> std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >
> random.tcc:3332
>          std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long,
> 16807ul, 0ul, 2147483647ul>, double>::operator() random.h:185
>          std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned
> long, 16807ul, 0ul, 2147483647ul> > random.h:1818
>          std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned
> long, 16807ul, 0ul, 2147483647ul> > random.h:1809
>          main ine.cpp:14
>          __libc_start_main +241
>          _start +4194346
>     - 25.88% std::__detail::_Mod<unsigned long, 2147483647ul, 16807ul, 0ul,
> true, true>::__calc random.h:143
>          std::__detail::__mod<unsigned long, 2147483647ul, 16807ul, 0ul>
> random.h:151
>          std::linear_congruential_engine<unsigned long, 16807ul, 0ul,
> 2147483647ul>::operator() random.h:332
>          std::generate_canonical<double, 53ul,
> std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >
> random.tcc:3332
>          std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long,
> 16807ul, 0ul, 2147483647ul>, double>::operator() random.h:185
>          std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned
> long, 16807ul, 0ul, 2147483647ul> > random.h:1818
>          std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned
> long, 16807ul, 0ul, 2147483647ul> > random.h:1809
>          main ine.cpp:14
>          __libc_start_main +241
>          _start +4194346
>     + 4.14% main inline.cpp:13
>     + 2.89%
> std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned
> long, 16807ul, 0ul, 2147483647ul> > random.h:1818
>     + 0.68% std::_Norm_helper<true>::_S_do_it<double> complex:655
>     + 0.67% std::generate_canonical<double, 53ul,
> std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >
> random.tcc:3326
> -   26.24%  inline   libm-2.24.so      [.] __hypot_finite
>     - __hypot_finite +179
>        - 25.56% hypot +20
>             std::__complex_abs complex:589
>             std::abs<double> complex:597
>             std::_Norm_helper<true>::_S_do_it<double> complex:654
>             std::norm<double> complex:664
>             main ine.cpp:14
>             __libc_start_main +241
>             _start +4194346
>          0.68% 0x40ead238ed1a2cf4
> -    6.85%  inline   libm-2.24.so      [.] hypot
>       hypot +40
>       std::__complex_abs complex:589
>       std::abs<double> complex:597
>       std::_Norm_helper<true>::_S_do_it<double> complex:654
>       std::norm<double> complex:664
>       main ine.cpp:14
>       __libc_start_main +241
>       _start +4194346
> ~~~~~~~~~~~~~~
>
> I hope you see the value of such a report versus what you are proposing.
>
> Thanks
>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Feedback on: [RESEND PATCH v2 0/5] perf report: Show inline stack
  2017-01-10  1:53         ` Jin, Yao
@ 2017-01-10 15:03           ` Milian Wolff
  2017-01-11  1:01             ` Jin, Yao
  0 siblings, 1 reply; 9+ messages in thread
From: Milian Wolff @ 2017-01-10 15:03 UTC (permalink / raw)
  To: Jin, Yao; +Cc: linux-perf-users

[-- Attachment #1: Type: text/plain, Size: 8129 bytes --]

On Tuesday, January 10, 2017 9:53:04 AM CET Jin, Yao wrote:
> Hi Wolff,
> 
> I update the patch series with adding inline searching for each
> callchain entry.

Hey Yao,

great! Is your git branch publicly accessible somewhere, so I can check it 
out?

> Now the result tested on my skl machine is:
> 
> # Overhead  Command  Shared Object        Symbol
> # ........  .......  ................... ..................................
> #
>      56.76%  inline   inline               [.] main
> 
>              ---main
>                 __libc_start_main
>                 _start

This is still wrong, no? Tons of stuff gets inlined into main and it seems as 
if your code is still not capable of displaying what that is. Please refer to 
the output I pasted from my patch in the previous mail. I'm explicitly looking 
for the cost of using <random> e.g.

>      40.54%  inline   libm-2.23.so         [.] __hypot_finite
> 
>              ---__hypot_finite
> 
>                  --40.09%--__hypot
> 
>                            ---/usr/include/c++/5/bits/random.h:151 (inline)
>                               /usr/include/c++/5/bits/random.h:332 (inline)
> /usr/include/c++/5/bits/random.tcc:3328 (inline)
>                               /usr/include/c++/5/bits/random.h:185 (inline)
>                               /usr/include/c++/5/bits/random.h:1818 (inline)
> /usr/include/c++/5/bits/random.h:1809 (inline)
> /home/jinyao/skl-ws/perf-dev/lck-2867/test/inline.cpp:14 (inline)
>                            main
>                            __libc_start_main
>                            _start

Much better, great work! But again I have to ask the same question:

Why not integrate directly with the other call stack nodes? This would solve 
the following problems:

- correct display of children cost for inlined frames, i.e. in a `perf report 
--children --inline`, I suspect your branch won't actually show any top-level 
entries for inline.cpp:14, correct?

- display/aggregation: currently you are always displaying srcline in your 
approach, what if I want to display/aggregate by function symbol instead?

So, could we please discuss this please?

Thanks a lot

> On 1/9/2017 4:41 PM, Milian Wolff wrote:
> > On Monday, January 9, 2017 4:17:48 PM CET Jin, Yao wrote:
> >> Hi Wolff,
> >> 
> >> Thanks a lot for your test code!
> >> 
> >> I have updated my patch. The new output for your test code (binary name
> >> is inline) is:
> >> 
> >> # Overhead  Command  Shared Object        Symbol
> >> # ........  .......  ...................
> >> ..................................
> >> #
> >> 
> >>       56.76%  inline   inline               [.] main
> >>       
> >>               ---main
> >>               
> >>                  __libc_start_main
> >>                  _start
> >>       
> >>       40.54%  inline   libm-2.23.so         [.] __hypot_finite
> >>       
> >>               ---__hypot_finite
> >>               
> >>                   --40.09%--__hypot
> >>                   
> >>                             main
> >>                             __libc_start_main
> >>                             _start
> >>        
> >>        1.30%  inline   libm-2.23.so         [.] __hypot
> >>        
> >>               ---__hypot
> >>               
> >>                  main
> >>                  
> >>                  ---/usr/include/c++/5/bits/random.h:151 (inline)
> >>                  
> >>                     /usr/include/c++/5/bits/random.h:332 (inline)
> >>                     /usr/include/c++/5/bits/random.tcc:3328 (inline)
> >>                     /usr/include/c++/5/bits/random.h:185 (inline)
> >>                     /usr/include/c++/5/bits/random.h:1818 (inline)
> >>                     /usr/include/c++/5/bits/random.h:1809 (inline)
> >> 
> >> /home/jinyao/skl-ws/perf-dev/lck-2867/test/inline.cpp:14 (inline)
> >> 
> >>                  __libc_start_main
> >>                  _start
> >> 
> >> The inline entry is tagged with "(inline)". Is this result within
> >> expectation?
> > 
> > No, this is not the expected output. I expect to get information about the
> > inlined frames in the middle of the call stack, in this example that
> > happens for the first two major hotspots.
> > 
> > Below is the output from my approach to this problem using my -g srcline:
> > 
> > ~~~~~~~~~~~~~~ pasted here for readability:
> > https://paste.kde.org/pmq3bc1le
> > Samples: 149  of event 'cycles:ppp', Event count (approx.): 141334499
> > 
> >    Overhead  Command  Shared Object     Symbol
> > 
> > -   62.24%  inline   inline            [.] main
> > 
> >     - 27.98% std::generate_canonical<double, 53ul,
> > 
> > std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>
> > > random.tcc:3332
> > 
> >          std::__detail::_Adaptor<std::linear_congruential_engine<unsigned
> >          long,
> > 
> > 16807ul, 0ul, 2147483647ul>, double>::operator() random.h:185
> > 
> >          std::uniform_real_distribution<double>::operator()<std::linear_co
> >          ngruential_engine<unsigned> 
> > long, 16807ul, 0ul, 2147483647ul> > random.h:1818
> > 
> >          std::uniform_real_distribution<double>::operator()<std::linear_co
> >          ngruential_engine<unsigned> 
> > long, 16807ul, 0ul, 2147483647ul> > random.h:1809
> > 
> >          main ine.cpp:14
> >          __libc_start_main +241
> >          _start +4194346
> >     
> >     - 25.88% std::__detail::_Mod<unsigned long, 2147483647ul, 16807ul,
> >     0ul,
> > 
> > true, true>::__calc random.h:143
> > 
> >          std::__detail::__mod<unsigned long, 2147483647ul, 16807ul, 0ul>
> > 
> > random.h:151
> > 
> >          std::linear_congruential_engine<unsigned long, 16807ul, 0ul,
> > 
> > 2147483647ul>::operator() random.h:332
> > 
> >          std::generate_canonical<double, 53ul,
> > 
> > std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>
> > > random.tcc:3332
> > 
> >          std::__detail::_Adaptor<std::linear_congruential_engine<unsigned
> >          long,
> > 
> > 16807ul, 0ul, 2147483647ul>, double>::operator() random.h:185
> > 
> >          std::uniform_real_distribution<double>::operator()<std::linear_co
> >          ngruential_engine<unsigned> 
> > long, 16807ul, 0ul, 2147483647ul> > random.h:1818
> > 
> >          std::uniform_real_distribution<double>::operator()<std::linear_co
> >          ngruential_engine<unsigned> 
> > long, 16807ul, 0ul, 2147483647ul> > random.h:1809
> > 
> >          main ine.cpp:14
> >          __libc_start_main +241
> >          _start +4194346
> >     
> >     + 4.14% main inline.cpp:13
> >     + 2.89%
> > 
> > std::uniform_real_distribution<double>::operator()<std::linear_congruentia
> > l_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > random.h:1818
> > 
> >     + 0.68% std::_Norm_helper<true>::_S_do_it<double> complex:655
> >     + 0.67% std::generate_canonical<double, 53ul,
> > 
> > std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>
> > > random.tcc:3326
> > -   26.24%  inline   libm-2.24.so      [.] __hypot_finite
> > 
> >     - __hypot_finite +179
> >     
> >        - 25.56% hypot +20
> >        
> >             std::__complex_abs complex:589
> >             std::abs<double> complex:597
> >             std::_Norm_helper<true>::_S_do_it<double> complex:654
> >             std::norm<double> complex:664
> >             main ine.cpp:14
> >             __libc_start_main +241
> >             _start +4194346
> >          
> >          0.68% 0x40ead238ed1a2cf4
> > 
> > -    6.85%  inline   libm-2.24.so      [.] hypot
> > 
> >       hypot +40
> >       std::__complex_abs complex:589
> >       std::abs<double> complex:597
> >       std::_Norm_helper<true>::_S_do_it<double> complex:654
> >       std::norm<double> complex:664
> >       main ine.cpp:14
> >       __libc_start_main +241
> >       _start +4194346
> > 
> > ~~~~~~~~~~~~~~
> > 
> > I hope you see the value of such a report versus what you are proposing.
> > 
> > Thanks


-- 
Milian Wolff | milian.wolff@kdab.com | Software Engineer
KDAB (Deutschland) GmbH&Co KG, a KDAB Group company
Tel: +49-30-521325470
KDAB - The Qt Experts

[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5903 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Feedback on: [RESEND PATCH v2 0/5] perf report: Show inline stack
  2017-01-10 15:03           ` Milian Wolff
@ 2017-01-11  1:01             ` Jin, Yao
  2017-01-13  8:56               ` Milian Wolff
  0 siblings, 1 reply; 9+ messages in thread
From: Jin, Yao @ 2017-01-11  1:01 UTC (permalink / raw)
  To: Milian Wolff; +Cc: linux-perf-users

Hi Wolff,

See my comments below

Thanks

Jin Yao

On 1/10/2017 11:03 PM, Milian Wolff wrote:
> On Tuesday, January 10, 2017 9:53:04 AM CET Jin, Yao wrote:
>> Hi Wolff,
>>
>> I update the patch series with adding inline searching for each
>> callchain entry.
> Hey Yao,
>
> great! Is your git branch publicly accessible somewhere, so I can check it
> out?
Sorry, no public git branch yet, but I will send my patch series to you 
in another mail thread.
>> Now the result tested on my skl machine is:
>>
>> # Overhead  Command  Shared Object        Symbol
>> # ........  .......  ................... ..................................
>> #
>>       56.76%  inline   inline               [.] main
>>
>>               ---main
>>                  __libc_start_main
>>                  _start
> This is still wrong, no? Tons of stuff gets inlined into main and it seems as
> if your code is still not capable of displaying what that is. Please refer to
> the output I pasted from my patch in the previous mail. I'm explicitly looking
> for the cost of using <random> e.g.
That might be not an issue. I have checked rip of each callchain entry 
and use "addr2line -e <binary> -i <rip>" to verify if it's in inline 
function, same result.
There is probably different rip getting sampled or different sortkey 
used. The previous example is:
perf report --no-children --stdio --inline

If I use the perf report --children --stdio --inline, the result is

# Children      Self  Command  Shared Object        Symbol
# ........  ........  .......  ................... 
..................................
#
     98.15%    56.76%  inline   inline               [.] main
             |
             |--56.76%--_start
             |          __libc_start_main
             |          main
             |
                        |
                        ---/usr/include/c++/5/bits/random.h:151 (inline)
                           /usr/include/c++/5/bits/random.h:332 (inline)
                           /usr/include/c++/5/bits/random.tcc:3328 (inline)
                           /usr/include/c++/5/bits/random.h:185 (inline)
                           /usr/include/c++/5/bits/random.h:1818 (inline)
                           /usr/include/c++/5/bits/random.h:1809 (inline)
/home/jinyao/skl-ws/perf-dev/lck-2867/test/inline.cpp:14 (inline)
              --41.39%--main
                        __hypot
                        |
                         --40.09%--__hypot_finite

Anyway, you can try my latest patch series and check what happens on 
your machine.
>>       40.54%  inline   libm-2.23.so         [.] __hypot_finite
>>
>>               ---__hypot_finite
>>
>>                   --40.09%--__hypot
>>
>>                             ---/usr/include/c++/5/bits/random.h:151 (inline)
>>                                /usr/include/c++/5/bits/random.h:332 (inline)
>> /usr/include/c++/5/bits/random.tcc:3328 (inline)
>>                                /usr/include/c++/5/bits/random.h:185 (inline)
>>                                /usr/include/c++/5/bits/random.h:1818 (inline)
>> /usr/include/c++/5/bits/random.h:1809 (inline)
>> /home/jinyao/skl-ws/perf-dev/lck-2867/test/inline.cpp:14 (inline)
>>                             main
>>                             __libc_start_main
>>                             _start
> Much better, great work! But again I have to ask the same question:
>
> Why not integrate directly with the other call stack nodes? This would solve
> the following problems:
>
> - correct display of children cost for inlined frames, i.e. in a `perf report
> --children --inline`, I suspect your branch won't actually show any top-level
> entries for inline.cpp:14, correct?
>
> - display/aggregation: currently you are always displaying srcline in your
> approach, what if I want to display/aggregate by function symbol instead?
I just follow the behavior of addr2line. If I use the command line 
"addr2line -e <binary> -i <rip>", it just display the srcline on my 
machine.
> So, could we please discuss this please?
>
> Thanks a lot
>
>> On 1/9/2017 4:41 PM, Milian Wolff wrote:
>>> On Monday, January 9, 2017 4:17:48 PM CET Jin, Yao wrote:
>>>> Hi Wolff,
>>>>
>>>> Thanks a lot for your test code!
>>>>
>>>> I have updated my patch. The new output for your test code (binary name
>>>> is inline) is:
>>>>
>>>> # Overhead  Command  Shared Object        Symbol
>>>> # ........  .......  ...................
>>>> ..................................
>>>> #
>>>>
>>>>        56.76%  inline   inline               [.] main
>>>>        
>>>>                ---main
>>>>                
>>>>                   __libc_start_main
>>>>                   _start
>>>>        
>>>>        40.54%  inline   libm-2.23.so         [.] __hypot_finite
>>>>        
>>>>                ---__hypot_finite
>>>>                
>>>>                    --40.09%--__hypot
>>>>                    
>>>>                              main
>>>>                              __libc_start_main
>>>>                              _start
>>>>         
>>>>         1.30%  inline   libm-2.23.so         [.] __hypot
>>>>         
>>>>                ---__hypot
>>>>                
>>>>                   main
>>>>                   
>>>>                   ---/usr/include/c++/5/bits/random.h:151 (inline)
>>>>                   
>>>>                      /usr/include/c++/5/bits/random.h:332 (inline)
>>>>                      /usr/include/c++/5/bits/random.tcc:3328 (inline)
>>>>                      /usr/include/c++/5/bits/random.h:185 (inline)
>>>>                      /usr/include/c++/5/bits/random.h:1818 (inline)
>>>>                      /usr/include/c++/5/bits/random.h:1809 (inline)
>>>>
>>>> /home/jinyao/skl-ws/perf-dev/lck-2867/test/inline.cpp:14 (inline)
>>>>
>>>>                   __libc_start_main
>>>>                   _start
>>>>
>>>> The inline entry is tagged with "(inline)". Is this result within
>>>> expectation?
>>> No, this is not the expected output. I expect to get information about the
>>> inlined frames in the middle of the call stack, in this example that
>>> happens for the first two major hotspots.
>>>
>>> Below is the output from my approach to this problem using my -g srcline:
>>>
>>> ~~~~~~~~~~~~~~ pasted here for readability:
>>> https://paste.kde.org/pmq3bc1le
>>> Samples: 149  of event 'cycles:ppp', Event count (approx.): 141334499
>>>
>>>     Overhead  Command  Shared Object     Symbol
>>>
>>> -   62.24%  inline   inline            [.] main
>>>
>>>      - 27.98% std::generate_canonical<double, 53ul,
>>>
>>> std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>
>>>> random.tcc:3332
>>>           std::__detail::_Adaptor<std::linear_congruential_engine<unsigned
>>>           long,
>>>
>>> 16807ul, 0ul, 2147483647ul>, double>::operator() random.h:185
>>>
>>>           std::uniform_real_distribution<double>::operator()<std::linear_co
>>>           ngruential_engine<unsigned>
>>> long, 16807ul, 0ul, 2147483647ul> > random.h:1818
>>>
>>>           std::uniform_real_distribution<double>::operator()<std::linear_co
>>>           ngruential_engine<unsigned>
>>> long, 16807ul, 0ul, 2147483647ul> > random.h:1809
>>>
>>>           main ine.cpp:14
>>>           __libc_start_main +241
>>>           _start +4194346
>>>      
>>>      - 25.88% std::__detail::_Mod<unsigned long, 2147483647ul, 16807ul,
>>>      0ul,
>>>
>>> true, true>::__calc random.h:143
>>>
>>>           std::__detail::__mod<unsigned long, 2147483647ul, 16807ul, 0ul>
>>>
>>> random.h:151
>>>
>>>           std::linear_congruential_engine<unsigned long, 16807ul, 0ul,
>>>
>>> 2147483647ul>::operator() random.h:332
>>>
>>>           std::generate_canonical<double, 53ul,
>>>
>>> std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>
>>>> random.tcc:3332
>>>           std::__detail::_Adaptor<std::linear_congruential_engine<unsigned
>>>           long,
>>>
>>> 16807ul, 0ul, 2147483647ul>, double>::operator() random.h:185
>>>
>>>           std::uniform_real_distribution<double>::operator()<std::linear_co
>>>           ngruential_engine<unsigned>
>>> long, 16807ul, 0ul, 2147483647ul> > random.h:1818
>>>
>>>           std::uniform_real_distribution<double>::operator()<std::linear_co
>>>           ngruential_engine<unsigned>
>>> long, 16807ul, 0ul, 2147483647ul> > random.h:1809
>>>
>>>           main ine.cpp:14
>>>           __libc_start_main +241
>>>           _start +4194346
>>>      
>>>      + 4.14% main inline.cpp:13
>>>      + 2.89%
>>>
>>> std::uniform_real_distribution<double>::operator()<std::linear_congruentia
>>> l_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > random.h:1818
>>>
>>>      + 0.68% std::_Norm_helper<true>::_S_do_it<double> complex:655
>>>      + 0.67% std::generate_canonical<double, 53ul,
>>>
>>> std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>
>>>> random.tcc:3326
>>> -   26.24%  inline   libm-2.24.so      [.] __hypot_finite
>>>
>>>      - __hypot_finite +179
>>>      
>>>         - 25.56% hypot +20
>>>         
>>>              std::__complex_abs complex:589
>>>              std::abs<double> complex:597
>>>              std::_Norm_helper<true>::_S_do_it<double> complex:654
>>>              std::norm<double> complex:664
>>>              main ine.cpp:14
>>>              __libc_start_main +241
>>>              _start +4194346
>>>           
>>>           0.68% 0x40ead238ed1a2cf4
>>>
>>> -    6.85%  inline   libm-2.24.so      [.] hypot
>>>
>>>        hypot +40
>>>        std::__complex_abs complex:589
>>>        std::abs<double> complex:597
>>>        std::_Norm_helper<true>::_S_do_it<double> complex:654
>>>        std::norm<double> complex:664
>>>        main ine.cpp:14
>>>        __libc_start_main +241
>>>        _start +4194346
>>>
>>> ~~~~~~~~~~~~~~
>>>
>>> I hope you see the value of such a report versus what you are proposing.
>>>
>>> Thanks
>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Feedback on: [RESEND PATCH v2 0/5] perf report: Show inline stack
  2017-01-11  1:01             ` Jin, Yao
@ 2017-01-13  8:56               ` Milian Wolff
  0 siblings, 0 replies; 9+ messages in thread
From: Milian Wolff @ 2017-01-13  8:56 UTC (permalink / raw)
  To: Jin, Yao; +Cc: linux-perf-users

[-- Attachment #1: Type: text/plain, Size: 11686 bytes --]

On Wednesday, January 11, 2017 9:01:44 AM CET Jin, Yao wrote:
> Hi Wolff,
> 
> See my comments below
> 
> Thanks

Hey Yao, much appreciated!

> On 1/10/2017 11:03 PM, Milian Wolff wrote:
> > On Tuesday, January 10, 2017 9:53:04 AM CET Jin, Yao wrote:
> >> Hi Wolff,
> >> 
> >> I update the patch series with adding inline searching for each
> >> callchain entry.
> > 
> > Hey Yao,
> > 
> > great! Is your git branch publicly accessible somewhere, so I can check it
> > out?
> 
> Sorry, no public git branch yet, but I will send my patch series to you
> in another mail thread.

Perfect, I'll apply and test it later today, I think.

> >> Now the result tested on my skl machine is:
> >> 
> >> # Overhead  Command  Shared Object        Symbol
> >> # ........  .......  ...................
> >> ..................................
> >> #
> >> 
> >>       56.76%  inline   inline               [.] main
> >>       
> >>               ---main
> >>               
> >>                  __libc_start_main
> >>                  _start
> > 
> > This is still wrong, no? Tons of stuff gets inlined into main and it seems
> > as if your code is still not capable of displaying what that is. Please
> > refer to the output I pasted from my patch in the previous mail. I'm
> > explicitly looking for the cost of using <random> e.g.
> 
> That might be not an issue. I have checked rip of each callchain entry
> and use "addr2line -e <binary> -i <rip>" to verify if it's in inline
> function, same result.

Ah, this may actually be the following issue:

https://lists.fedorahosted.org/archives/list/elfutils-devel@lists.fedorahosted.org/thread/X5XUOW4DQ6DNE5SL4QPG3MGGQPJUVSCW/
#G2L4JBSBRDZDZRPL6MF46SSPER5UME2G

I'll test and see if addr - 1 produces better inlining information, I have 
applied this workaround in my branch. But, of course, we'll need to fix this 
properly, based on the approach that elfutils is taking.

> There is probably different rip getting sampled or different sortkey
> used. The previous example is:
> perf report --no-children --stdio --inline
> 
> If I use the perf report --children --stdio --inline, the result is
> 
> # Children      Self  Command  Shared Object        Symbol
> # ........  ........  .......  ...................
> ..................................
> #
>      98.15%    56.76%  inline   inline               [.] main
> 
>              |--56.76%--_start
>              |
>              |          __libc_start_main
>              |          main
> 
>                         ---/usr/include/c++/5/bits/random.h:151 (inline)
>                            /usr/include/c++/5/bits/random.h:332 (inline)
>                            /usr/include/c++/5/bits/random.tcc:3328 (inline)
>                            /usr/include/c++/5/bits/random.h:185 (inline)
>                            /usr/include/c++/5/bits/random.h:1818 (inline)
>                            /usr/include/c++/5/bits/random.h:1809 (inline)
> /home/jinyao/skl-ws/perf-dev/lck-2867/test/inline.cpp:14 (inline)
>               --41.39%--main
>                         __hypot
> 
>                          --40.09%--__hypot_finite
> 
> Anyway, you can try my latest patch series and check what happens on
> your machine.

I will, thanks a lot again! Much appreciated.

Cheers

> >>       40.54%  inline   libm-2.23.so         [.] __hypot_finite
> >>       
> >>               ---__hypot_finite
> >>               
> >>                   --40.09%--__hypot
> >>                   
> >>                             ---/usr/include/c++/5/bits/random.h:151
> >>                             (inline)
> >>                             
> >>                                /usr/include/c++/5/bits/random.h:332
> >>                                (inline)
> >> 
> >> /usr/include/c++/5/bits/random.tcc:3328 (inline)
> >> 
> >>                                /usr/include/c++/5/bits/random.h:185
> >>                                (inline)
> >>                                /usr/include/c++/5/bits/random.h:1818
> >>                                (inline)
> >> 
> >> /usr/include/c++/5/bits/random.h:1809 (inline)
> >> /home/jinyao/skl-ws/perf-dev/lck-2867/test/inline.cpp:14 (inline)
> >> 
> >>                             main
> >>                             __libc_start_main
> >>                             _start
> > 
> > Much better, great work! But again I have to ask the same question:
> > 
> > Why not integrate directly with the other call stack nodes? This would
> > solve the following problems:
> > 
> > - correct display of children cost for inlined frames, i.e. in a `perf
> > report --children --inline`, I suspect your branch won't actually show
> > any top-level entries for inline.cpp:14, correct?
> > 
> > - display/aggregation: currently you are always displaying srcline in your
> > approach, what if I want to display/aggregate by function symbol instead?
> 
> I just follow the behavior of addr2line. If I use the command line
> "addr2line -e <binary> -i <rip>", it just display the srcline on my
> machine.
> 
> > So, could we please discuss this please?
> > 
> > Thanks a lot
> > 
> >> On 1/9/2017 4:41 PM, Milian Wolff wrote:
> >>> On Monday, January 9, 2017 4:17:48 PM CET Jin, Yao wrote:
> >>>> Hi Wolff,
> >>>> 
> >>>> Thanks a lot for your test code!
> >>>> 
> >>>> I have updated my patch. The new output for your test code (binary name
> >>>> is inline) is:
> >>>> 
> >>>> # Overhead  Command  Shared Object        Symbol
> >>>> # ........  .......  ...................
> >>>> ..................................
> >>>> #
> >>>> 
> >>>>        56.76%  inline   inline               [.] main
> >>>>        
> >>>>                ---main
> >>>>                
> >>>>                   __libc_start_main
> >>>>                   _start
> >>>>        
> >>>>        40.54%  inline   libm-2.23.so         [.] __hypot_finite
> >>>>        
> >>>>                ---__hypot_finite
> >>>>                
> >>>>                    --40.09%--__hypot
> >>>>                    
> >>>>                              main
> >>>>                              __libc_start_main
> >>>>                              _start
> >>>>         
> >>>>         1.30%  inline   libm-2.23.so         [.] __hypot
> >>>>         
> >>>>                ---__hypot
> >>>>                
> >>>>                   main
> >>>>                   
> >>>>                   ---/usr/include/c++/5/bits/random.h:151 (inline)
> >>>>                   
> >>>>                      /usr/include/c++/5/bits/random.h:332 (inline)
> >>>>                      /usr/include/c++/5/bits/random.tcc:3328 (inline)
> >>>>                      /usr/include/c++/5/bits/random.h:185 (inline)
> >>>>                      /usr/include/c++/5/bits/random.h:1818 (inline)
> >>>>                      /usr/include/c++/5/bits/random.h:1809 (inline)
> >>>> 
> >>>> /home/jinyao/skl-ws/perf-dev/lck-2867/test/inline.cpp:14 (inline)
> >>>> 
> >>>>                   __libc_start_main
> >>>>                   _start
> >>>> 
> >>>> The inline entry is tagged with "(inline)". Is this result within
> >>>> expectation?
> >>> 
> >>> No, this is not the expected output. I expect to get information about
> >>> the
> >>> inlined frames in the middle of the call stack, in this example that
> >>> happens for the first two major hotspots.
> >>> 
> >>> Below is the output from my approach to this problem using my -g
> >>> srcline:
> >>> 
> >>> ~~~~~~~~~~~~~~ pasted here for readability:
> >>> https://paste.kde.org/pmq3bc1le
> >>> Samples: 149  of event 'cycles:ppp', Event count (approx.): 141334499
> >>> 
> >>>     Overhead  Command  Shared Object     Symbol
> >>> 
> >>> -   62.24%  inline   inline            [.] main
> >>> 
> >>>      - 27.98% std::generate_canonical<double, 53ul,
> >>> 
> >>> std::linear_congruential_engine<unsigned long, 16807ul, 0ul,
> >>> 2147483647ul>
> >>> 
> >>>> random.tcc:3332
> >>>> 
> >>>           std::__detail::_Adaptor<std::linear_congruential_engine<unsign
> >>>           ed
> >>>           long,
> >>> 
> >>> 16807ul, 0ul, 2147483647ul>, double>::operator() random.h:185
> >>> 
> >>>           std::uniform_real_distribution<double>::operator()<std::linear
> >>>           _co
> >>>           ngruential_engine<unsigned>
> >>> 
> >>> long, 16807ul, 0ul, 2147483647ul> > random.h:1818
> >>> 
> >>>           std::uniform_real_distribution<double>::operator()<std::linear
> >>>           _co
> >>>           ngruential_engine<unsigned>
> >>> 
> >>> long, 16807ul, 0ul, 2147483647ul> > random.h:1809
> >>> 
> >>>           main ine.cpp:14
> >>>           __libc_start_main +241
> >>>           _start +4194346
> >>>      
> >>>      - 25.88% std::__detail::_Mod<unsigned long, 2147483647ul, 16807ul,
> >>>      0ul,
> >>> 
> >>> true, true>::__calc random.h:143
> >>> 
> >>>           std::__detail::__mod<unsigned long, 2147483647ul, 16807ul,
> >>>           0ul>
> >>> 
> >>> random.h:151
> >>> 
> >>>           std::linear_congruential_engine<unsigned long, 16807ul, 0ul,
> >>> 
> >>> 2147483647ul>::operator() random.h:332
> >>> 
> >>>           std::generate_canonical<double, 53ul,
> >>> 
> >>> std::linear_congruential_engine<unsigned long, 16807ul, 0ul,
> >>> 2147483647ul>
> >>> 
> >>>> random.tcc:3332
> >>>> 
> >>>           std::__detail::_Adaptor<std::linear_congruential_engine<unsign
> >>>           ed
> >>>           long,
> >>> 
> >>> 16807ul, 0ul, 2147483647ul>, double>::operator() random.h:185
> >>> 
> >>>           std::uniform_real_distribution<double>::operator()<std::linear
> >>>           _co
> >>>           ngruential_engine<unsigned>
> >>> 
> >>> long, 16807ul, 0ul, 2147483647ul> > random.h:1818
> >>> 
> >>>           std::uniform_real_distribution<double>::operator()<std::linear
> >>>           _co
> >>>           ngruential_engine<unsigned>
> >>> 
> >>> long, 16807ul, 0ul, 2147483647ul> > random.h:1809
> >>> 
> >>>           main ine.cpp:14
> >>>           __libc_start_main +241
> >>>           _start +4194346
> >>>      
> >>>      + 4.14% main inline.cpp:13
> >>>      + 2.89%
> >>> 
> >>> std::uniform_real_distribution<double>::operator()<std::linear_congruent
> >>> ia
> >>> l_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > random.h:1818
> >>> 
> >>>      + 0.68% std::_Norm_helper<true>::_S_do_it<double> complex:655
> >>>      + 0.67% std::generate_canonical<double, 53ul,
> >>> 
> >>> std::linear_congruential_engine<unsigned long, 16807ul, 0ul,
> >>> 2147483647ul>
> >>> 
> >>>> random.tcc:3326
> >>> 
> >>> -   26.24%  inline   libm-2.24.so      [.] __hypot_finite
> >>> 
> >>>      - __hypot_finite +179
> >>>      
> >>>         - 25.56% hypot +20
> >>>         
> >>>              std::__complex_abs complex:589
> >>>              std::abs<double> complex:597
> >>>              std::_Norm_helper<true>::_S_do_it<double> complex:654
> >>>              std::norm<double> complex:664
> >>>              main ine.cpp:14
> >>>              __libc_start_main +241
> >>>              _start +4194346
> >>>           
> >>>           0.68% 0x40ead238ed1a2cf4
> >>> 
> >>> -    6.85%  inline   libm-2.24.so      [.] hypot
> >>> 
> >>>        hypot +40
> >>>        std::__complex_abs complex:589
> >>>        std::abs<double> complex:597
> >>>        std::_Norm_helper<true>::_S_do_it<double> complex:654
> >>>        std::norm<double> complex:664
> >>>        main ine.cpp:14
> >>>        __libc_start_main +241
> >>>        _start +4194346
> >>> 
> >>> ~~~~~~~~~~~~~~
> >>> 
> >>> I hope you see the value of such a report versus what you are proposing.
> >>> 
> >>> Thanks


-- 
Milian Wolff | milian.wolff@kdab.com | Software Engineer
KDAB (Deutschland) GmbH&Co KG, a KDAB Group company
Tel: +49-30-521325470
KDAB - The Qt Experts

[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5903 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2017-01-13  8:56 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-01-07 20:13 Feedback on: [RESEND PATCH v2 0/5] perf report: Show inline stack Milian Wolff
2017-01-09  3:07 ` Jin, Yao
2017-01-09  8:02   ` Milian Wolff
2017-01-09  8:17     ` Jin, Yao
2017-01-09  8:41       ` Milian Wolff
2017-01-10  1:53         ` Jin, Yao
2017-01-10 15:03           ` Milian Wolff
2017-01-11  1:01             ` Jin, Yao
2017-01-13  8:56               ` Milian Wolff

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).