usability issues with inlining and backtraces

linux-perf-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* usability issues with inlining and backtraces
@ 2016-05-23  9:10 Milian Wolff
  2016-05-23 14:11 ` Arnaldo Carvalho de Melo
  2016-05-23 14:59 ` Andi Kleen
  0 siblings, 2 replies; 11+ messages in thread
From: Milian Wolff @ 2016-05-23  9:10 UTC (permalink / raw)
  To: linux-perf-users

[-- Attachment #1: Type: text/plain, Size: 10777 bytes --]

Hey all,

I've recently encountered some nasty issues with perf which should - I hope - 
be possible to be fixed.

First of all, I learned about `perf report -s sym,srcline`, most notably the 
`srcline`, which finally (!) allows me to correlate sample cost directly to 
line numbers, without trying to interpret the often quite convoluted output of 
`perf annotate`. But, take the following example code:

~~~~~~~~~~~~
#include <complex>
#include <cmath>
#include <random>
#include <iostream>

using namespace std;

int main()
{
    uniform_real_distribution<double> uniform(-1E5, 1E5);
    default_random_engine engine;
    double s = 0;
    for (int i = 0; i < 1000000; ++i) {
        s += norm(complex<double>(uniform(engine), uniform(engine)));
    }
    cout << s << '\n';
    return 0;
}
~~~~~~~~~~~~

Compile it with

    g++ -std=c++11 -g -O2 test.cpp

Then profile it with perf:

    perf record --call-graph dwarf

########## perf report backtraces in face of inlining

Now let us go through the the output of the report with srcline enabled:

~~~~~~~~~~~~
    perf report --stdio -s sym,srcline --no-children

    21.07%  [.] main                   random.tcc:3332
            |
            ---__libc_start_main
               _start
    20.30%  [.] main                   random.h:143
            |
            ---__libc_start_main
               _start
~~~~~~~~~~~~

Here, the cost of inlined functions (random.tcc:3332,random.h:143) is 
attributed to the main function. This is of course correct, but very unhelpful 
to me as a programmer. I'm much more interested in the line inside test.cpp 
which triggered the call to random.tcc:3332 etc. Is there a way to get that 
data? Note how you can a way better backtrace when using GDB on the same 
binary as above: 

~~~~~~~~~~~~
    gdb a.out
    break random.tcc:3332
    run
Breakpoint 1, std::generate_canonical<double, 53ul, 
std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > 
(__urng=...) at /usr/include/c++/6.1.1/bits/random.tcc:3332
3332                  __sum += _RealType(__urng() - __urng.min()) * __tmp;
(gdb) bt
#0  std::generate_canonical<double, 53ul, 
std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > 
(__urng=...) at /usr/include/c++/6.1.1/bits/random.tcc:3332
#1  std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long, 
16807ul, 0ul, 2147483647ul>, double>::operator() (this=<optimized out>)
    at /usr/include/c++/6.1.1/bits/random.h:185
#2  
std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned 
long, 16807ul, 0ul, 2147483647ul> > (this=<optimized out>, __p=..., 
__urng=...)
    at /usr/include/c++/6.1.1/bits/random.h:1818
#3  
std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned 
long, 16807ul, 0ul, 2147483647ul> > (__urng=<synthetic pointer>, 
this=<synthetic pointer>)
    at /usr/include/c++/6.1.1/bits/random.h:1809
#4  main () at test.cpp:14
~~~~~~~~~~~~

Why is perf (or is it libunwind?) not capable of getting the same backtrace?

######### srcline in perf report backtraces

If we continue with the `perf report` output from the command above, we'll
eventually see:

~~~~~~~~~~~~

     4.60%  [.] main                   test.cpp:14
            |
            ---__libc_start_main
               _start

     3.63%  [.] __hypot_finite         __hypot_finite+257
            |
            ---hypot
               main
               __libc_start_main
               _start

     3.25%  [.] hypot                  hypot+55
            |
            ---main
               __libc_start_main
               _start

     3.21%  [.] __hypot_finite         __hypot_finite+152
            |
            ---hypot
               main
               __libc_start_main
               _start

     2.63%  [.] __hypot_finite         __hypot_finite+66
            |
            ---hypot
               main
... continued
~~~~~~~~~

Here, I have again a few things to note:

On one hand, we have the same issue as above, namely inlined functions being 
attributed directly to the parent function. See how the backtrace shows main 
calling hypot? Look at the source, I'm not calling hypot anywhere - it's 
std::norm calling it internally eventually. And GDB does know about that and 
can give me a proper backtrace.

But, differently to above, the major gripe I have with this output is 
exemplified by this part:

~~~~~~~~~
     3.63%  [.] __hypot_finite         __hypot_finite+257
            |
            ---hypot
               main
               __libc_start_main
               _start
~~~~~~~~~

I added `srcline` to the report, but the backtrace still only contains the 
symbol name. I hope, that we can simply honor srcline there as well, to at 
least print something like this instead:

~~~~~~~~~
     3.63%  [.] __hypot_finite         __hypot_finite+257
            |
            ---hypot
               main at test.cpp:14 ### <-- srcline here as well please?
               __libc_start_main
               _start
~~~~~~~~~

Is there a reason why this was not yet implemented? If not, then I'll prepare 
a patch for that, as I hope this to be easy to implement.

########## symbol cost aggregation

Once we have that implemented, can we maybe account for the following: I told 
`perf report` to aggregate by symbol first, then srcline, i.e. `-s 
sym,srcline`. But the report seems to aggregate by `srcline`, because I see 
some symbols (__hypot_finite) multiple times, for different code points. Can 
we merge those and start the backtrace then at the different code points? 
Something like this would be my desirable output:

~~~~~~~~
# see above for what I actually get, here I'm accumulating manually
$ perf report --stdio -s sym,srcline --no-children |grep __hypot_finite+ | awk 
'{sum+=$1}END{print sum}'
34.96
# this is what I would like to get directly
$ perf report --stdio -s sym,srcline --no-children 
...
34.96%  [.] __hypot_finite
            |
     3.63%  | __hypot_finite         __hypot_finite+257
            - ...
     3.21%  | __hypot_finite         __hypot_finite+152
            - ...
     ...
~~~~~~~~~

What do you think?

####### perf annotate for mere mortals

To conclude this already long email, I want to raise the mess that is `perf 
annotate`. If I run it on my above data file, I get the following output:

~~~~~~~~~
  1.23 │       mulsd  0x2f2(%rip),%xmm0        # 400bd0 <_IO_stdin_used+0x20>
  2.47 │       subsd  0x2f2(%rip),%xmm0        # 400bd8 <_IO_stdin_used+0x28>
       │     _ZSt13__complex_absCd():
       │     #if _GLIBCXX_USE_C99_COMPLEX
       │       inline float
       │       __complex_abs(__complex__ float __z) { return 
__builtin_cabsf(__z); }
       │
       │       inline double
       │       __complex_abs(__complex__ double __z) { return 
__builtin_cabs(__z); }
  1.23 │     → callq  cabs@plt
       │     _ZNSt12_Norm_helperILb1EE8_S_do_itIdEET_RKSt7complexIS2_E():
       │         {
       │           template<typename _Tp>
       │             static inline _Tp _S_do_it(const complex<_Tp>& __z)
       │             {
       │               _Tp __res = std::abs(__z);
       │               return __res * __res;
       │       mulsd  %xmm0,%xmm0
       │     main():
       │         for (int i = 0; i < 1000000; ++i) {
  1.23 │       sub    $0x1,%ebp
       │             s += norm(complex<double>(uniform(engine), 
uniform(engine)));
  7.41 │       addsd  0x8(%rsp),%xmm0
       │       movsd  %xmm0,0x8(%rsp)
       │     int main()
       │     {
       │         uniform_real_distribution<double> uniform(-1E5, 1E5);
       │         default_random_engine engine;
       │         double s = 0;
       │         for (int i = 0; i < 1000000; ++i) {
       │     ↓ je     24e
       │     
_ZNSt8__detail4_ModImLm2147483647ELm16807ELm0ELb1ELb1EE6__calcEm():
       │           struct _Mod<_Tp, __m, __a, __c, true, __s>
       │           {
       │             static _Tp
       │             __calc(_Tp __x)
       │             {
       │               _Tp __res = __a * __x + __c;
       │194:   imul   $0x41a7,%rbx,%rbx

~~~~~~~

I personally think this output is extremely hard to interpret. I see the value 
of the above output when you want to look at the cost of hot computational 
kernels, but for most user space applications the above is pretty much 
useless. Most notably, note that the above shows self cost only and I could 
not find a way to trigger inclusive cost visualization. It makes me believe 
that the addition of the complex norms is the most costly operation in my 
code. That is not true, the inclusive cost of the random engine as well as 
actually computing the norm is way higher, i.e. I see

   1.23 │     → callq  cabs@plt

but from the output of `perf report` above that triggers the calls to 
`__hypot` which have close to 40% of the cost.

As such, I propose a "simplified" `perf annotate` output, which gets closer to 
what you'll see in other profilers, e.g. VTune or also Microsoft Visual 
studio's sampling profilers: Use the source files and annotate those with 
inclusive cost. Only show the cost of individual binary instructions (status 
quo) when explicitly asked. I.e. I want the output to look something like that 
(I made up the actual percentages):

~~~~~~~~~
        #include <complex>
        #include <cmath>
        #include <random>
        #include <iostream>

        using namespace std;

        int main()
        {
1%          uniform_real_distribution<double> uniform(-1E5, 1E5);
1%          default_random_engine engine;
            double s = 0;
1%          for (int i = 0; i < 1000000; ++i) {
95%            s += norm(complex<double>(uniform(engine), uniform(engine)));
            }
2%          cout << s << '\n';
            return 0;
        }
~~~~~~~~~

Bonus points if I could then navigate to a source line and dig deeper (i.e. 
see the instructions for this line and then jump to a function that is being 
called e.g.).

Thanks a lot for reading this far. Please note that I really like perf, I just 
have trouble teaching people using it as many things are really really hard 
and I hope we can work together to make perf much easier to use!

Thanks
-- 
Milian Wolff | milian.wolff@kdab.com | Software Engineer
KDAB (Deutschland) GmbH&Co KG, a KDAB Group company
Tel: +49-30-521325470
KDAB - The Qt Experts

[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5903 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: usability issues with inlining and backtraces
  2016-05-23  9:10 usability issues with inlining and backtraces Milian Wolff
@ 2016-05-23 14:11 ` Arnaldo Carvalho de Melo
  2016-08-15 11:44   ` Milian Wolff
  2016-05-23 14:59 ` Andi Kleen
  1 sibling, 1 reply; 11+ messages in thread
From: Arnaldo Carvalho de Melo @ 2016-05-23 14:11 UTC (permalink / raw)
  To: Milian Wolff; +Cc: linux-perf-users

Em Mon, May 23, 2016 at 11:10:00AM +0200, Milian Wolff escreveu:
> Hey all,
> 
> I've recently encountered some nasty issues with perf which should - I hope - 
> be possible to be fixed.

I've quickly read this, all interesting, but I need some time to digest
it fully, which I don't have right now, I'll get to it as soon as I can.

The fact that other open source tools do it as you expect is a precious
observation, as we can go look at how they solved problems.

Yeah, agreed that we need getting these tools to be more useful for
userland developers, and to, as much as we can, connect events connected
to source code.

Hopefully some other people will participate.

- Arnaldo
 
> First of all, I learned about `perf report -s sym,srcline`, most notably the 
> `srcline`, which finally (!) allows me to correlate sample cost directly to 
> line numbers, without trying to interpret the often quite convoluted output of 
> `perf annotate`. But, take the following example code:
> 
> ~~~~~~~~~~~~
> #include <complex>
> #include <cmath>
> #include <random>
> #include <iostream>
> 
> using namespace std;
> 
> int main()
> {
>     uniform_real_distribution<double> uniform(-1E5, 1E5);
>     default_random_engine engine;
>     double s = 0;
>     for (int i = 0; i < 1000000; ++i) {
>         s += norm(complex<double>(uniform(engine), uniform(engine)));
>     }
>     cout << s << '\n';
>     return 0;
> }
> ~~~~~~~~~~~~
> 
> Compile it with
> 
>     g++ -std=c++11 -g -O2 test.cpp
> 
> Then profile it with perf:
> 
>     perf record --call-graph dwarf
> 
> ########## perf report backtraces in face of inlining
> 
> Now let us go through the the output of the report with srcline enabled:
> 
> ~~~~~~~~~~~~
>     perf report --stdio -s sym,srcline --no-children
> 
>     21.07%  [.] main                   random.tcc:3332
>             |
>             ---__libc_start_main
>                _start
>     20.30%  [.] main                   random.h:143
>             |
>             ---__libc_start_main
>                _start
> ~~~~~~~~~~~~
> 
> Here, the cost of inlined functions (random.tcc:3332,random.h:143) is 
> attributed to the main function. This is of course correct, but very unhelpful 
> to me as a programmer. I'm much more interested in the line inside test.cpp 
> which triggered the call to random.tcc:3332 etc. Is there a way to get that 
> data? Note how you can a way better backtrace when using GDB on the same 
> binary as above: 
> 
> ~~~~~~~~~~~~
>     gdb a.out
>     break random.tcc:3332
>     run
> Breakpoint 1, std::generate_canonical<double, 53ul, 
> std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > 
> (__urng=...) at /usr/include/c++/6.1.1/bits/random.tcc:3332
> 3332                  __sum += _RealType(__urng() - __urng.min()) * __tmp;
> (gdb) bt
> #0  std::generate_canonical<double, 53ul, 
> std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > 
> (__urng=...) at /usr/include/c++/6.1.1/bits/random.tcc:3332
> #1  std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long, 
> 16807ul, 0ul, 2147483647ul>, double>::operator() (this=<optimized out>)
>     at /usr/include/c++/6.1.1/bits/random.h:185
> #2  
> std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned 
> long, 16807ul, 0ul, 2147483647ul> > (this=<optimized out>, __p=..., 
> __urng=...)
>     at /usr/include/c++/6.1.1/bits/random.h:1818
> #3  
> std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned 
> long, 16807ul, 0ul, 2147483647ul> > (__urng=<synthetic pointer>, 
> this=<synthetic pointer>)
>     at /usr/include/c++/6.1.1/bits/random.h:1809
> #4  main () at test.cpp:14
> ~~~~~~~~~~~~
> 
> Why is perf (or is it libunwind?) not capable of getting the same backtrace?
> 
> ######### srcline in perf report backtraces
> 
> If we continue with the `perf report` output from the command above, we'll
> eventually see:
> 
> ~~~~~~~~~~~~
> 
>      4.60%  [.] main                   test.cpp:14
>             |
>             ---__libc_start_main
>                _start
> 
>      3.63%  [.] __hypot_finite         __hypot_finite+257
>             |
>             ---hypot
>                main
>                __libc_start_main
>                _start
> 
>      3.25%  [.] hypot                  hypot+55
>             |
>             ---main
>                __libc_start_main
>                _start
> 
>      3.21%  [.] __hypot_finite         __hypot_finite+152
>             |
>             ---hypot
>                main
>                __libc_start_main
>                _start
> 
>      2.63%  [.] __hypot_finite         __hypot_finite+66
>             |
>             ---hypot
>                main
> ... continued
> ~~~~~~~~~
> 
> Here, I have again a few things to note:
> 
> On one hand, we have the same issue as above, namely inlined functions being 
> attributed directly to the parent function. See how the backtrace shows main 
> calling hypot? Look at the source, I'm not calling hypot anywhere - it's 
> std::norm calling it internally eventually. And GDB does know about that and 
> can give me a proper backtrace.
> 
> But, differently to above, the major gripe I have with this output is 
> exemplified by this part:
> 
> ~~~~~~~~~
>      3.63%  [.] __hypot_finite         __hypot_finite+257
>             |
>             ---hypot
>                main
>                __libc_start_main
>                _start
> ~~~~~~~~~
> 
> I added `srcline` to the report, but the backtrace still only contains the 
> symbol name. I hope, that we can simply honor srcline there as well, to at 
> least print something like this instead:
> 
> ~~~~~~~~~
>      3.63%  [.] __hypot_finite         __hypot_finite+257
>             |
>             ---hypot
>                main at test.cpp:14 ### <-- srcline here as well please?
>                __libc_start_main
>                _start
> ~~~~~~~~~
> 
> Is there a reason why this was not yet implemented? If not, then I'll prepare 
> a patch for that, as I hope this to be easy to implement.
> 
> ########## symbol cost aggregation
> 
> Once we have that implemented, can we maybe account for the following: I told 
> `perf report` to aggregate by symbol first, then srcline, i.e. `-s 
> sym,srcline`. But the report seems to aggregate by `srcline`, because I see 
> some symbols (__hypot_finite) multiple times, for different code points. Can 
> we merge those and start the backtrace then at the different code points? 
> Something like this would be my desirable output:
> 
> ~~~~~~~~
> # see above for what I actually get, here I'm accumulating manually
> $ perf report --stdio -s sym,srcline --no-children |grep __hypot_finite+ | awk 
> '{sum+=$1}END{print sum}'
> 34.96
> # this is what I would like to get directly
> $ perf report --stdio -s sym,srcline --no-children 
> ...
> 34.96%  [.] __hypot_finite
>             |
>      3.63%  | __hypot_finite         __hypot_finite+257
>             - ...
>      3.21%  | __hypot_finite         __hypot_finite+152
>             - ...
>      ...
> ~~~~~~~~~
> 
> What do you think?
> 
> ####### perf annotate for mere mortals
> 
> To conclude this already long email, I want to raise the mess that is `perf 
> annotate`. If I run it on my above data file, I get the following output:
> 
> ~~~~~~~~~
>   1.23 │       mulsd  0x2f2(%rip),%xmm0        # 400bd0 <_IO_stdin_used+0x20>
>   2.47 │       subsd  0x2f2(%rip),%xmm0        # 400bd8 <_IO_stdin_used+0x28>
>        │     _ZSt13__complex_absCd():
>        │     #if _GLIBCXX_USE_C99_COMPLEX
>        │       inline float
>        │       __complex_abs(__complex__ float __z) { return 
> __builtin_cabsf(__z); }
>        │
>        │       inline double
>        │       __complex_abs(__complex__ double __z) { return 
> __builtin_cabs(__z); }
>   1.23 │     → callq  cabs@plt
>        │     _ZNSt12_Norm_helperILb1EE8_S_do_itIdEET_RKSt7complexIS2_E():
>        │         {
>        │           template<typename _Tp>
>        │             static inline _Tp _S_do_it(const complex<_Tp>& __z)
>        │             {
>        │               _Tp __res = std::abs(__z);
>        │               return __res * __res;
>        │       mulsd  %xmm0,%xmm0
>        │     main():
>        │         for (int i = 0; i < 1000000; ++i) {
>   1.23 │       sub    $0x1,%ebp
>        │             s += norm(complex<double>(uniform(engine), 
> uniform(engine)));
>   7.41 │       addsd  0x8(%rsp),%xmm0
>        │       movsd  %xmm0,0x8(%rsp)
>        │     int main()
>        │     {
>        │         uniform_real_distribution<double> uniform(-1E5, 1E5);
>        │         default_random_engine engine;
>        │         double s = 0;
>        │         for (int i = 0; i < 1000000; ++i) {
>        │     ↓ je     24e
>        │     
> _ZNSt8__detail4_ModImLm2147483647ELm16807ELm0ELb1ELb1EE6__calcEm():
>        │           struct _Mod<_Tp, __m, __a, __c, true, __s>
>        │           {
>        │             static _Tp
>        │             __calc(_Tp __x)
>        │             {
>        │               _Tp __res = __a * __x + __c;
>        │194:   imul   $0x41a7,%rbx,%rbx
> 
> ~~~~~~~
> 
> I personally think this output is extremely hard to interpret. I see the value 
> of the above output when you want to look at the cost of hot computational 
> kernels, but for most user space applications the above is pretty much 
> useless. Most notably, note that the above shows self cost only and I could 
> not find a way to trigger inclusive cost visualization. It makes me believe 
> that the addition of the complex norms is the most costly operation in my 
> code. That is not true, the inclusive cost of the random engine as well as 
> actually computing the norm is way higher, i.e. I see
> 
>    1.23 │     → callq  cabs@plt
> 
> but from the output of `perf report` above that triggers the calls to 
> `__hypot` which have close to 40% of the cost.
> 
> As such, I propose a "simplified" `perf annotate` output, which gets closer to 
> what you'll see in other profilers, e.g. VTune or also Microsoft Visual 
> studio's sampling profilers: Use the source files and annotate those with 
> inclusive cost. Only show the cost of individual binary instructions (status 
> quo) when explicitly asked. I.e. I want the output to look something like that 
> (I made up the actual percentages):
> 
> ~~~~~~~~~
>         #include <complex>
>         #include <cmath>
>         #include <random>
>         #include <iostream>
> 
>         using namespace std;
> 
>         int main()
>         {
> 1%          uniform_real_distribution<double> uniform(-1E5, 1E5);
> 1%          default_random_engine engine;
>             double s = 0;
> 1%          for (int i = 0; i < 1000000; ++i) {
> 95%            s += norm(complex<double>(uniform(engine), uniform(engine)));
>             }
> 2%          cout << s << '\n';
>             return 0;
>         }
> ~~~~~~~~~
> 
> Bonus points if I could then navigate to a source line and dig deeper (i.e. 
> see the instructions for this line and then jump to a function that is being 
> called e.g.).
> 
> Thanks a lot for reading this far. Please note that I really like perf, I just 
> have trouble teaching people using it as many things are really really hard 
> and I hope we can work together to make perf much easier to use!
> 
> Thanks
> -- 
> Milian Wolff | milian.wolff@kdab.com | Software Engineer
> KDAB (Deutschland) GmbH&Co KG, a KDAB Group company
> Tel: +49-30-521325470
> KDAB - The Qt Experts

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: usability issues with inlining and backtraces
  2016-05-23  9:10 usability issues with inlining and backtraces Milian Wolff
  2016-05-23 14:11 ` Arnaldo Carvalho de Melo
@ 2016-05-23 14:59 ` Andi Kleen
  2016-05-23 15:18   ` Milian Wolff
  1 sibling, 1 reply; 11+ messages in thread
From: Andi Kleen @ 2016-05-23 14:59 UTC (permalink / raw)
  To: Milian Wolff; +Cc: linux-perf-users

Milian Wolff <milian.wolff@kdab.com> writes:
> Here, the cost of inlined functions (random.tcc:3332,random.h:143) is 
> attributed to the main function. This is of course correct, but very unhelpful 
> to me as a programmer. I'm much more interested in the line inside test.cpp 
> which triggered the call to random.tcc:3332 etc. Is there a way to get that 
> data? Note how you can a way better backtrace when using GDB on the same 
> binary as above:

Yes this works using --call-graph ...,address

If you use srcfile it can resolve inlines. it is currently not supported
for other backtraces, but yes expanding them in the normal history
would be useful.

> On one hand, we have the same issue as above, namely inlined functions being 
> attributed directly to the parent function. See how the backtrace shows main 
> calling hypot? Look at the source, I'm not calling hypot anywhere - it's 
> std::norm calling it internally eventually. And GDB does know about that and 
> can give me a proper backtrace.

Use srcfile then.

>
> But, differently to above, the major gripe I have with this output is 
> exemplified by this part:
>
> ~~~~~~~~~
>      3.63%  [.] __hypot_finite         __hypot_finite+257
>             |
>             ---hypot
>                main
>                __libc_start_main
>                _start
> ~~~~~~~~~
>
> I added `srcline` to the report, but the backtrace still only contains the 
> symbol name. I hope, that we can simply honor srcline there as well, to at 
> least print something like this instead:

--call-graph ....,address

(or --branch-history)

> ########## symbol cost aggregation
>
> Once we have that implemented, can we maybe account for the following: I told 
> `perf report` to aggregate by symbol first, then srcline, i.e. `-s 
> sym,srcline`. But the report seems to aggregate by `srcline`, because I see 
> some symbols (__hypot_finite) multiple times, for different code points. Can 
> we merge those and start the backtrace then at the different code points? 
> Something like this would be my desirable output:

perf sorts by address, but the compiler can generate the same symbol
in multiple versions or in may inlines instances. That's probably
more work to fix.

> As such, I propose a "simplified" `perf annotate` output, which gets closer to 
> what you'll see in other profilers, e.g. VTune or also Microsoft Visual 
> studio's sampling profilers: Use the source files and annotate those with 
> inclusive cost. Only show the cost of individual binary instructions (status 
> quo) when explicitly asked. I.e. I want the output to look something like that 
> (I made up the actual percentages):

Sounds like the fallacy of abstracted performance analysis.

-Andi

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: usability issues with inlining and backtraces
  2016-05-23 14:59 ` Andi Kleen
@ 2016-05-23 15:18   ` Milian Wolff
  2016-06-13 16:07     ` Milian Wolff
  0 siblings, 1 reply; 11+ messages in thread
From: Milian Wolff @ 2016-05-23 15:18 UTC (permalink / raw)
  To: Andi Kleen; +Cc: linux-perf-users

[-- Attachment #1: Type: text/plain, Size: 4987 bytes --]

On Monday, May 23, 2016 7:59:54 AM CEST Andi Kleen wrote:
> Milian Wolff <milian.wolff@kdab.com> writes:
> > Here, the cost of inlined functions (random.tcc:3332,random.h:143) is
> > attributed to the main function. This is of course correct, but very
> > unhelpful to me as a programmer. I'm much more interested in the line
> > inside test.cpp which triggered the call to random.tcc:3332 etc. Is there
> > a way to get that data? Note how you can a way better backtrace when
> > using GDB on the same
> > binary as above:
>
> Yes this works using --call-graph ...,address
> 
> If you use srcfile it can resolve inlines. it is currently not supported
> for other backtraces, but yes expanding them in the normal history
> would be useful.

Can you expand on this part a bit please? It's the first time I realize that I 
can add a sort key to `--call-graph`, so at the very least this shows that the 
defaults are bad.

For the example I'm now trying this:

    $ perf report -g graph,address -s sym,srcline --no-children --stdio

# Overhead  Symbol                       Source:Line       
# ........  ...........................  ..................
#
    21.82%  [.] main                     random.tcc:3332
            |          
            |--3.84%--main +8388740
            |          __libc_start_main +140389813309681
            |          _start +8388649
            | ...
     6.29%  [.] __hypot_finite           __hypot_finite+152
            |
            ---hypot +140389819301908
               main +8388987
               __libc_start_main +140389813309681
               _start +8388649

-> much better, but still unusable backtraces. And my `perf report` only 
allows this:

    sort_key:       call graph sort key (function|address)

So we need to add support for "|srcline" here as well?

> > On one hand, we have the same issue as above, namely inlined functions
> > being attributed directly to the parent function. See how the backtrace
> > shows main calling hypot? Look at the source, I'm not calling hypot
> > anywhere - it's std::norm calling it internally eventually. And GDB does
> > know about that and can give me a proper backtrace.
> 
> Use srcfile then.

See above, can you please extend this answer a bit? How would you use srcfile, 
and where, to get the full backtrace? Is that maybe a patch that is not yet 
included in acme's perf/core branch?

> > But, differently to above, the major gripe I have with this output is
> > exemplified by this part:
> > 
> > ~~~~~~~~~
> > 
> >      3.63%  [.] __hypot_finite         __hypot_finite+257
> >      
> >             ---hypot
> >             
> >                main
> >                __libc_start_main
> >                _start
> > 
> > ~~~~~~~~~
> > 
> > I added `srcline` to the report, but the backtrace still only contains the
> > symbol name. I hope, that we can simply honor srcline there as well, to at
> 
> > least print something like this instead:
> --call-graph ....,address
> 
> (or --branch-history)


Both produce output like I've shown above, i.e. they always put down

  <symbol> +<offset>

insetead of using the much more useable

  <symbol> <file>:<line>

format. What am I missing?

> > ########## symbol cost aggregation
> > 
> > Once we have that implemented, can we maybe account for the following: I
> > told `perf report` to aggregate by symbol first, then srcline, i.e. `-s
> > sym,srcline`. But the report seems to aggregate by `srcline`, because I
> > see some symbols (__hypot_finite) multiple times, for different code
> > points. Can we merge those and start the backtrace then at the different
> > code points?
> > Something like this would be my desirable output:
>
> perf sorts by address, but the compiler can generate the same symbol
> in multiple versions or in may inlines instances. That's probably
> more work to fix.
> 
> > As such, I propose a "simplified" `perf annotate` output, which gets
> > closer to what you'll see in other profilers, e.g. VTune or also
> > Microsoft Visual studio's sampling profilers: Use the source files and
> > annotate those with inclusive cost. Only show the cost of individual
> > binary instructions (status quo) when explicitly asked. I.e. I want the
> > output to look something like that
> > (I made up the actual percentages):
>
> Sounds like the fallacy of abstracted performance analysis.

Can you please elaborate that some more as well? For me, as a user-space 
application developer, getting an aggregated inclusive cost for the source 
code I actually write is extremely useful. Of course, when it's not directly 
clear from that view why something is slow I still need to dig deeper and get 
to the current in-depth assembly view. But why is having a simplified overview 
a fallacy? Why is this abstracted performance analysis?

Thanks a lot for your input!
-- 
Milian Wolff | milian.wolff@kdab.com | Software Engineer
KDAB (Deutschland) GmbH&Co KG, a KDAB Group company
Tel: +49-30-521325470
KDAB - The Qt Experts

[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5903 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: usability issues with inlining and backtraces
  2016-05-23 15:18   ` Milian Wolff
@ 2016-06-13 16:07     ` Milian Wolff
  2016-08-15  9:32       ` Milian Wolff
  0 siblings, 1 reply; 11+ messages in thread
From: Milian Wolff @ 2016-06-13 16:07 UTC (permalink / raw)
  To: andi; +Cc: linux-perf-users

[-- Attachment #1: Type: text/plain, Size: 5475 bytes --]

Ping?

Andi, I'd be very interested in your answers on my questions below. Most 
notably, `srcline` with `--call-graph ...,address` seems to be highly buggy 
for me.

Thanks

On Monday, May 23, 2016 5:18:53 PM CEST Milian Wolff wrote:
> On Monday, May 23, 2016 7:59:54 AM CEST Andi Kleen wrote:
> > Milian Wolff <milian.wolff@kdab.com> writes:
> > > Here, the cost of inlined functions (random.tcc:3332,random.h:143) is
> > > attributed to the main function. This is of course correct, but very
> > > unhelpful to me as a programmer. I'm much more interested in the line
> > > inside test.cpp which triggered the call to random.tcc:3332 etc. Is
> > > there
> > > a way to get that data? Note how you can a way better backtrace when
> > > using GDB on the same
> > 
> > > binary as above:
> > Yes this works using --call-graph ...,address
> > 
> > If you use srcfile it can resolve inlines. it is currently not supported
> > for other backtraces, but yes expanding them in the normal history
> > would be useful.
> 
> Can you expand on this part a bit please? It's the first time I realize that
> I can add a sort key to `--call-graph`, so at the very least this shows
> that the defaults are bad.
> 
> For the example I'm now trying this:
> 
>     $ perf report -g graph,address -s sym,srcline --no-children --stdio
> 
> # Overhead  Symbol                       Source:Line
> # ........  ...........................  ..................
> #
>     21.82%  [.] main                     random.tcc:3332
> 
>             |--3.84%--main +8388740
>             |
>             |          __libc_start_main +140389813309681
>             |          _start +8388649
>             | 
>             | ...
> 
>      6.29%  [.] __hypot_finite           __hypot_finite+152
> 
>             ---hypot +140389819301908
>                main +8388987
>                __libc_start_main +140389813309681
>                _start +8388649
> 
> -> much better, but still unusable backtraces. And my `perf report` only
> allows this:
> 
>     sort_key:       call graph sort key (function|address)
> 
> So we need to add support for "|srcline" here as well?
> 
> > > On one hand, we have the same issue as above, namely inlined functions
> > > being attributed directly to the parent function. See how the backtrace
> > > shows main calling hypot? Look at the source, I'm not calling hypot
> > > anywhere - it's std::norm calling it internally eventually. And GDB does
> > > know about that and can give me a proper backtrace.
> > 
> > Use srcfile then.
> 
> See above, can you please extend this answer a bit? How would you use
> srcfile, and where, to get the full backtrace? Is that maybe a patch that
> is not yet included in acme's perf/core branch?
> 
> > > But, differently to above, the major gripe I have with this output is
> > > exemplified by this part:
> > > 
> > > ~~~~~~~~~
> > > 
> > >      3.63%  [.] __hypot_finite         __hypot_finite+257
> > >      
> > >             ---hypot
> > >             
> > >                main
> > >                __libc_start_main
> > >                _start
> > > 
> > > ~~~~~~~~~
> > > 
> > > I added `srcline` to the report, but the backtrace still only contains
> > > the
> > > symbol name. I hope, that we can simply honor srcline there as well, to
> > > at
> > 
> > > least print something like this instead:
> > --call-graph ....,address
> > 
> > (or --branch-history)
> 
> Both produce output like I've shown above, i.e. they always put down
> 
>   <symbol> +<offset>
> 
> insetead of using the much more useable
> 
>   <symbol> <file>:<line>
> 
> format. What am I missing?
> 
> > > ########## symbol cost aggregation
> > > 
> > > Once we have that implemented, can we maybe account for the following: I
> > > told `perf report` to aggregate by symbol first, then srcline, i.e. `-s
> > > sym,srcline`. But the report seems to aggregate by `srcline`, because I
> > > see some symbols (__hypot_finite) multiple times, for different code
> > > points. Can we merge those and start the backtrace then at the different
> > > code points?
> > 
> > > Something like this would be my desirable output:
> > perf sorts by address, but the compiler can generate the same symbol
> > in multiple versions or in may inlines instances. That's probably
> > more work to fix.
> > 
> > > As such, I propose a "simplified" `perf annotate` output, which gets
> > > closer to what you'll see in other profilers, e.g. VTune or also
> > > Microsoft Visual studio's sampling profilers: Use the source files and
> > > annotate those with inclusive cost. Only show the cost of individual
> > > binary instructions (status quo) when explicitly asked. I.e. I want the
> > > output to look something like that
> > 
> > > (I made up the actual percentages):
> > Sounds like the fallacy of abstracted performance analysis.
> 
> Can you please elaborate that some more as well? For me, as a user-space
> application developer, getting an aggregated inclusive cost for the source
> code I actually write is extremely useful. Of course, when it's not directly
> clear from that view why something is slow I still need to dig deeper and
> get to the current in-depth assembly view. But why is having a simplified
> overview a fallacy? Why is this abstracted performance analysis?
> 
> Thanks a lot for your input!


-- 
Milian Wolff | milian.wolff@kdab.com | Software Engineer
KDAB (Deutschland) GmbH&Co KG, a KDAB Group company
Tel: +49-30-521325470
KDAB - The Qt Experts

[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5903 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: usability issues with inlining and backtraces
  2016-06-13 16:07     ` Milian Wolff
@ 2016-08-15  9:32       ` Milian Wolff
  2016-08-15 11:32         ` Milian Wolff
  0 siblings, 1 reply; 11+ messages in thread
From: Milian Wolff @ 2016-08-15  9:32 UTC (permalink / raw)
  To: andi; +Cc: linux-perf-users

[-- Attachment #1: Type: text/plain, Size: 6152 bytes --]

On Monday, June 13, 2016 6:07:13 PM CEST Milian Wolff wrote:
> Ping?
> 
> Andi, I'd be very interested in your answers on my questions below. Most
> notably, `srcline` with `--call-graph ...,address` seems to be highly buggy
> for me.

Reping again.

Andi, or anyone else - I would really appreciate some more feedback on my 
questions below.

I esp. wonder how to leverage srcline and addresses in the call-graph, as that 
has the potential to greatly increase perf's value to me.

Thanks

> On Monday, May 23, 2016 5:18:53 PM CEST Milian Wolff wrote:
> > On Monday, May 23, 2016 7:59:54 AM CEST Andi Kleen wrote:
> > > Milian Wolff <milian.wolff@kdab.com> writes:
> > > > Here, the cost of inlined functions (random.tcc:3332,random.h:143) is
> > > > attributed to the main function. This is of course correct, but very
> > > > unhelpful to me as a programmer. I'm much more interested in the line
> > > > inside test.cpp which triggered the call to random.tcc:3332 etc. Is
> > > > there
> > > > a way to get that data? Note how you can a way better backtrace when
> > > > using GDB on the same
> > > 
> > > > binary as above:
> > > Yes this works using --call-graph ...,address
> > > 
> > > If you use srcfile it can resolve inlines. it is currently not supported
> > > for other backtraces, but yes expanding them in the normal history
> > > would be useful.
> > 
> > Can you expand on this part a bit please? It's the first time I realize
> > that I can add a sort key to `--call-graph`, so at the very least this
> > shows that the defaults are bad.
> > 
> > For the example I'm now trying this:
> >     $ perf report -g graph,address -s sym,srcline --no-children --stdio
> > 
> > # Overhead  Symbol                       Source:Line
> > # ........  ...........................  ..................
> > #
> > 
> >     21.82%  [.] main                     random.tcc:3332
> >     
> >             |--3.84%--main +8388740
> >             |
> >             |          __libc_start_main +140389813309681
> >             |          _start +8388649
> >             | 
> >             | ...
> >      
> >      6.29%  [.] __hypot_finite           __hypot_finite+152
> >      
> >             ---hypot +140389819301908
> >             
> >                main +8388987
> >                __libc_start_main +140389813309681
> >                _start +8388649
> > 
> > -> much better, but still unusable backtraces. And my `perf report` only
> > 
> > allows this:
> >     sort_key:       call graph sort key (function|address)
> > 
> > So we need to add support for "|srcline" here as well?
> > 
> > > > On one hand, we have the same issue as above, namely inlined functions
> > > > being attributed directly to the parent function. See how the
> > > > backtrace
> > > > shows main calling hypot? Look at the source, I'm not calling hypot
> > > > anywhere - it's std::norm calling it internally eventually. And GDB
> > > > does
> > > > know about that and can give me a proper backtrace.
> > > 
> > > Use srcfile then.
> > 
> > See above, can you please extend this answer a bit? How would you use
> > srcfile, and where, to get the full backtrace? Is that maybe a patch that
> > is not yet included in acme's perf/core branch?
> > 
> > > > But, differently to above, the major gripe I have with this output is
> > > > exemplified by this part:
> > > > 
> > > > ~~~~~~~~~
> > > > 
> > > >      3.63%  [.] __hypot_finite         __hypot_finite+257
> > > >      
> > > >             ---hypot
> > > >             
> > > >                main
> > > >                __libc_start_main
> > > >                _start
> > > > 
> > > > ~~~~~~~~~
> > > > 
> > > > I added `srcline` to the report, but the backtrace still only contains
> > > > the
> > > > symbol name. I hope, that we can simply honor srcline there as well,
> > > > to
> > > > at
> > > 
> > > > least print something like this instead:
> > > --call-graph ....,address
> > > 
> > > (or --branch-history)
> > 
> > Both produce output like I've shown above, i.e. they always put down
> > 
> >   <symbol> +<offset>
> > 
> > insetead of using the much more useable
> > 
> >   <symbol> <file>:<line>
> > 
> > format. What am I missing?
> > 
> > > > ########## symbol cost aggregation
> > > > 
> > > > Once we have that implemented, can we maybe account for the following:
> > > > I
> > > > told `perf report` to aggregate by symbol first, then srcline, i.e.
> > > > `-s
> > > > sym,srcline`. But the report seems to aggregate by `srcline`, because
> > > > I
> > > > see some symbols (__hypot_finite) multiple times, for different code
> > > > points. Can we merge those and start the backtrace then at the
> > > > different
> > > > code points?
> > > 
> > > > Something like this would be my desirable output:
> > > perf sorts by address, but the compiler can generate the same symbol
> > > in multiple versions or in may inlines instances. That's probably
> > > more work to fix.
> > > 
> > > > As such, I propose a "simplified" `perf annotate` output, which gets
> > > > closer to what you'll see in other profilers, e.g. VTune or also
> > > > Microsoft Visual studio's sampling profilers: Use the source files and
> > > > annotate those with inclusive cost. Only show the cost of individual
> > > > binary instructions (status quo) when explicitly asked. I.e. I want
> > > > the
> > > > output to look something like that
> > > 
> > > > (I made up the actual percentages):
> > > Sounds like the fallacy of abstracted performance analysis.
> > 
> > Can you please elaborate that some more as well? For me, as a user-space
> > application developer, getting an aggregated inclusive cost for the source
> > code I actually write is extremely useful. Of course, when it's not
> > directly clear from that view why something is slow I still need to dig
> > deeper and get to the current in-depth assembly view. But why is having a
> > simplified overview a fallacy? Why is this abstracted performance
> > analysis?
> > 
> > Thanks a lot for your input!


-- 
Milian Wolff | milian.wolff@kdab.com | Software Engineer
KDAB (Deutschland) GmbH&Co KG, a KDAB Group company
Tel: +49-30-521325470
KDAB - The Qt Experts

[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5903 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: usability issues with inlining and backtraces
  2016-08-15  9:32       ` Milian Wolff
@ 2016-08-15 11:32         ` Milian Wolff
  2016-08-15 17:13           ` Andi Kleen
  0 siblings, 1 reply; 11+ messages in thread
From: Milian Wolff @ 2016-08-15 11:32 UTC (permalink / raw)
  To: linux-perf-users

[-- Attachment #1: Type: text/plain, Size: 1633 bytes --]

On Monday, August 15, 2016 11:32:33 AM CEST Milian Wolff wrote:
> On Monday, June 13, 2016 6:07:13 PM CEST Milian Wolff wrote:
> > Ping?
> > 
> > Andi, I'd be very interested in your answers on my questions below. Most
> > notably, `srcline` with `--call-graph ...,address` seems to be highly
> > buggy
> > for me.
> 
> Reping again.
> 
> Andi, or anyone else - I would really appreciate some more feedback on my
> questions below.
> 
> I esp. wonder how to leverage srcline and addresses in the call-graph, as
> that has the potential to greatly increase perf's value to me.

OK, I have found at least two culprits that partially explain the behavior.

On one hand, I'm simply missing debug information for some libraries, such as 
libm, libc etc. pp. Due to that, the addr2line step fails and sym+offset is 
displayed.

On the other hand, there seems to be a bug in perf: 

When hist_entry__get_srcline calls get_srcline I see from debug output

he->ip=307e, sym=main -> rip_obj2dump=40307e -> mandelbrot.h:39

But in callchain_list__sym_name, I see:

cl->ip=40373d, sym=main -> rip_obj2dump=80373d-> main+8390413

So it seems like the callchain_list is missing a normalization step that is 
applied to the IP addresses stored in hist_entry. Can someone point me into 
the right direction?

If I change callchain_list__sym_name to pass cl->ip - cl->ms.map->start to 
map__rip_2objdump this feature works much better for me, but I doubt that this 
is the correct fix.

Thanks
-- 
Milian Wolff | milian.wolff@kdab.com | Software Engineer
KDAB (Deutschland) GmbH&Co KG, a KDAB Group company
Tel: +49-30-521325470
KDAB - The Qt Experts

[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5903 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: usability issues with inlining and backtraces
  2016-05-23 14:11 ` Arnaldo Carvalho de Melo
@ 2016-08-15 11:44   ` Milian Wolff
  0 siblings, 0 replies; 11+ messages in thread
From: Milian Wolff @ 2016-08-15 11:44 UTC (permalink / raw)
  To: acme; +Cc: linux-perf-users

[-- Attachment #1: Type: text/plain, Size: 2513 bytes --]

On Monday, May 23, 2016 11:11:04 AM CEST Arnaldo Carvalho de Melo wrote:
> Em Mon, May 23, 2016 at 11:10:00AM +0200, Milian Wolff escreveu:
> > Hey all,
> > 
> > I've recently encountered some nasty issues with perf which should - I
> > hope - be possible to be fixed.
> 
> I've quickly read this, all interesting, but I need some time to digest
> it fully, which I don't have right now, I'll get to it as soon as I can.
> 
> The fact that other open source tools do it as you expect is a precious
> observation, as we can go look at how they solved problems.

I have now found something very interesting in the perf code base:

The addr2line implementation with HAVE_LIBBFD_SUPPORT is actually capable of 
producing backtraces that include inliners, similar to what GDB can do. If I 
add debug output to the bfd_find_inliner_info loop inside addr2line in 
srcline.c, and force unwind_inlines to true, then I see e.g.:

~~~~~~~~~~
inline: 0 _ZNSt8__detail4_ModImLm2147483647ELm16807ELm0ELb1ELb1EE6__calcEm (/
usr/include/c++/6.1.1/bits/random.h:143)
inline: 1 _ZNSt8__detail5__modImLm2147483647ELm16807ELm0EEET_S1_ (/usr/
include/c++/6.1.1/bits/random.h:151)
inline: 2 _ZNSt26linear_congruential_engineImLm16807ELm0ELm2147483647EEclEv (/
usr/include/c++/6.1.1/bits/random.h:332)
inline: 3 
_ZSt18generate_canonicalIdLm53ESt26linear_congruential_engineImLm16807ELm0ELm2147483647EEET_RT1_ 
(/usr/include/c++/6.1.1/bits/random.tcc:3332)
inline: 4 
_ZNSt8__detail8_AdaptorISt26linear_congruential_engineImLm16807ELm0ELm2147483647EEdEclEv 
(/usr/include/c++/6.1.1/bits/random.h:185)
inline: 5 
_ZNSt25uniform_real_distributionIdEclISt26linear_congruential_engineImLm16807ELm0ELm2147483647EEEEdRT_RKNS0_10param_typeE 
(/usr/include/c++/6.1.1/bits/random.h:1818)
inline: 6 
_ZNSt25uniform_real_distributionIdEclISt26linear_congruential_engineImLm16807ELm0ELm2147483647EEEEdRT_ 
(/usr/include/c++/6.1.1/bits/random.h:1809)
inline: 7 main (/path/to/test.cpp:14)
~~~~~~~~~~

I would really like to see this information be included in the call graph 
shown by perf, e.g. in the stdio/browser output of perf report, but also in 
perf script etc. pp.

Can someone suggest a way to do that? These entries cannot be represented by a 
normal instruction pointer address, so extending the existing callchains with 
additional entries is not straight forward to do.

Thanks

-- 
Milian Wolff | milian.wolff@kdab.com | Software Engineer
KDAB (Deutschland) GmbH&Co KG, a KDAB Group company
Tel: +49-30-521325470
KDAB - The Qt Experts

[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5903 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: usability issues with inlining and backtraces
  2016-08-15 11:32         ` Milian Wolff
@ 2016-08-15 17:13           ` Andi Kleen
  2016-08-16 15:13             ` Namhyung Kim
  0 siblings, 1 reply; 11+ messages in thread
From: Andi Kleen @ 2016-08-15 17:13 UTC (permalink / raw)
  To: Milian Wolff; +Cc: jolsa, namhyung, linux-perf-users

Milian Wolff <milian.wolff@kdab.com> writes:

> On Monday, August 15, 2016 11:32:33 AM CEST Milian Wolff wrote:
>> On Monday, June 13, 2016 6:07:13 PM CEST Milian Wolff wrote:
>> > Ping?
>> > 
>> > Andi, I'd be very interested in your answers on my questions below. Most
>> > notably, `srcline` with `--call-graph ...,address` seems to be highly
>> > buggy
>> > for me.
>> 
>> Reping again.
>> 
>> Andi, or anyone else - I would really appreciate some more feedback on my
>> questions below.
>> 
>> I esp. wonder how to leverage srcline and addresses in the call-graph, as
>> that has the potential to greatly increase perf's value to me.
>
> OK, I have found at least two culprits that partially explain the behavior.
>
> On one hand, I'm simply missing debug information for some libraries, such as 
> libm, libc etc. pp. Due to that, the addr2line step fails and sym+offset is 
> displayed.
>
> On the other hand, there seems to be a bug in perf: 
>
> When hist_entry__get_srcline calls get_srcline I see from debug output
>
> he->ip=307e, sym=main -> rip_obj2dump=40307e -> mandelbrot.h:39
>
> But in callchain_list__sym_name, I see:
>
> cl->ip=40373d, sym=main -> rip_obj2dump=80373d-> main+8390413
>
> So it seems like the callchain_list is missing a normalization step that is 
> applied to the IP addresses stored in hist_entry. Can someone point me into 
> the right direction?
>
> If I change callchain_list__sym_name to pass cl->ip - cl->ms.map->start to 
> map__rip_2objdump this feature works much better for me, but I doubt that this 
> is the correct fix.

Yes it's something like that. Namhyung or Jiri probably know
what the right fix would be.

-Andi

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: usability issues with inlining and backtraces
  2016-08-15 17:13           ` Andi Kleen
@ 2016-08-16 15:13             ` Namhyung Kim
  2016-08-16 15:45               ` Milian Wolff
  0 siblings, 1 reply; 11+ messages in thread
From: Namhyung Kim @ 2016-08-16 15:13 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Milian Wolff, jolsa, linux-perf-users

On Mon, Aug 15, 2016 at 10:13:24AM -0700, Andi Kleen wrote:
> Milian Wolff <milian.wolff@kdab.com> writes:
> 
> > On Monday, August 15, 2016 11:32:33 AM CEST Milian Wolff wrote:
> >> On Monday, June 13, 2016 6:07:13 PM CEST Milian Wolff wrote:
> >> > Ping?
> >> > 
> >> > Andi, I'd be very interested in your answers on my questions below. Most
> >> > notably, `srcline` with `--call-graph ...,address` seems to be highly
> >> > buggy
> >> > for me.
> >> 
> >> Reping again.
> >> 
> >> Andi, or anyone else - I would really appreciate some more feedback on my
> >> questions below.
> >> 
> >> I esp. wonder how to leverage srcline and addresses in the call-graph, as
> >> that has the potential to greatly increase perf's value to me.
> >
> > OK, I have found at least two culprits that partially explain the behavior.
> >
> > On one hand, I'm simply missing debug information for some libraries, such as 
> > libm, libc etc. pp. Due to that, the addr2line step fails and sym+offset is 
> > displayed.
> >
> > On the other hand, there seems to be a bug in perf: 
> >
> > When hist_entry__get_srcline calls get_srcline I see from debug output
> >
> > he->ip=307e, sym=main -> rip_obj2dump=40307e -> mandelbrot.h:39
> >
> > But in callchain_list__sym_name, I see:
> >
> > cl->ip=40373d, sym=main -> rip_obj2dump=80373d-> main+8390413
> >
> > So it seems like the callchain_list is missing a normalization step that is 
> > applied to the IP addresses stored in hist_entry. Can someone point me into 
> > the right direction?
> >
> > If I change callchain_list__sym_name to pass cl->ip - cl->ms.map->start to 
> > map__rip_2objdump this feature works much better for me, but I doubt that this 
> > is the correct fix.
> 
> Yes it's something like that. Namhyung or Jiri probably know
> what the right fix would be.

I think the unwind libs should set entry->ip to al.addr instead of
ip.  The __hists__add_entry() sets hist_entry->ip using al->addr.

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: usability issues with inlining and backtraces
  2016-08-16 15:13             ` Namhyung Kim
@ 2016-08-16 15:45               ` Milian Wolff
  0 siblings, 0 replies; 11+ messages in thread
From: Milian Wolff @ 2016-08-16 15:45 UTC (permalink / raw)
  To: linux-perf-users; +Cc: Namhyung Kim

[-- Attachment #1: Type: text/plain, Size: 2376 bytes --]

On Mittwoch, 17. August 2016 00:13:10 CEST Namhyung Kim wrote:
> On Mon, Aug 15, 2016 at 10:13:24AM -0700, Andi Kleen wrote:
> > Milian Wolff <milian.wolff@kdab.com> writes:
> > > On Monday, August 15, 2016 11:32:33 AM CEST Milian Wolff wrote:
> > >> On Monday, June 13, 2016 6:07:13 PM CEST Milian Wolff wrote:
> > >> > Ping?
> > >> > 
> > >> > Andi, I'd be very interested in your answers on my questions below.
> > >> > Most
> > >> > notably, `srcline` with `--call-graph ...,address` seems to be highly
> > >> > buggy
> > >> > for me.
> > >> 
> > >> Reping again.
> > >> 
> > >> Andi, or anyone else - I would really appreciate some more feedback on
> > >> my
> > >> questions below.
> > >> 
> > >> I esp. wonder how to leverage srcline and addresses in the call-graph,
> > >> as
> > >> that has the potential to greatly increase perf's value to me.
> > > 
> > > OK, I have found at least two culprits that partially explain the
> > > behavior.
> > > 
> > > On one hand, I'm simply missing debug information for some libraries,
> > > such as libm, libc etc. pp. Due to that, the addr2line step fails and
> > > sym+offset is displayed.
> > > 
> > > On the other hand, there seems to be a bug in perf:
> > > 
> > > When hist_entry__get_srcline calls get_srcline I see from debug output
> > > 
> > > he->ip=307e, sym=main -> rip_obj2dump=40307e -> mandelbrot.h:39
> > > 
> > > But in callchain_list__sym_name, I see:
> > > 
> > > cl->ip=40373d, sym=main -> rip_obj2dump=80373d-> main+8390413
> > > 
> > > So it seems like the callchain_list is missing a normalization step that
> > > is
> > > applied to the IP addresses stored in hist_entry. Can someone point me
> > > into
> > > the right direction?
> > > 
> > > If I change callchain_list__sym_name to pass cl->ip - cl->ms.map->start
> > > to
> > > map__rip_2objdump this feature works much better for me, but I doubt
> > > that this is the correct fix.
> > 
> > Yes it's something like that. Namhyung or Jiri probably know
> > what the right fix would be.
> 
> I think the unwind libs should set entry->ip to al.addr instead of
> ip.  The __hists__add_entry() sets hist_entry->ip using al->addr.

Thank you, that works. I've submitted a patch accordingly.

Cheers

-- 
Milian Wolff | milian.wolff@kdab.com | Software Engineer
KDAB (Deutschland) GmbH&Co KG, a KDAB Group company
Tel: +49-30-521325470
KDAB - The Qt Experts

[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5903 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2016-08-16 15:45 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-05-23  9:10 usability issues with inlining and backtraces Milian Wolff
2016-05-23 14:11 ` Arnaldo Carvalho de Melo
2016-08-15 11:44   ` Milian Wolff
2016-05-23 14:59 ` Andi Kleen
2016-05-23 15:18   ` Milian Wolff
2016-06-13 16:07     ` Milian Wolff
2016-08-15  9:32       ` Milian Wolff
2016-08-15 11:32         ` Milian Wolff
2016-08-15 17:13           ` Andi Kleen
2016-08-16 15:13             ` Namhyung Kim
2016-08-16 15:45               ` Milian Wolff

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).