* usability issues with inlining and backtraces
@ 2016-05-23 9:10 Milian Wolff
2016-05-23 14:11 ` Arnaldo Carvalho de Melo
2016-05-23 14:59 ` Andi Kleen
0 siblings, 2 replies; 11+ messages in thread
From: Milian Wolff @ 2016-05-23 9:10 UTC (permalink / raw)
To: linux-perf-users
[-- Attachment #1: Type: text/plain, Size: 10777 bytes --]
Hey all,
I've recently encountered some nasty issues with perf which should - I hope -
be possible to be fixed.
First of all, I learned about `perf report -s sym,srcline`, most notably the
`srcline`, which finally (!) allows me to correlate sample cost directly to
line numbers, without trying to interpret the often quite convoluted output of
`perf annotate`. But, take the following example code:
~~~~~~~~~~~~
#include <complex>
#include <cmath>
#include <random>
#include <iostream>
using namespace std;
int main()
{
uniform_real_distribution<double> uniform(-1E5, 1E5);
default_random_engine engine;
double s = 0;
for (int i = 0; i < 1000000; ++i) {
s += norm(complex<double>(uniform(engine), uniform(engine)));
}
cout << s << '\n';
return 0;
}
~~~~~~~~~~~~
Compile it with
g++ -std=c++11 -g -O2 test.cpp
Then profile it with perf:
perf record --call-graph dwarf
########## perf report backtraces in face of inlining
Now let us go through the the output of the report with srcline enabled:
~~~~~~~~~~~~
perf report --stdio -s sym,srcline --no-children
21.07% [.] main random.tcc:3332
|
---__libc_start_main
_start
20.30% [.] main random.h:143
|
---__libc_start_main
_start
~~~~~~~~~~~~
Here, the cost of inlined functions (random.tcc:3332,random.h:143) is
attributed to the main function. This is of course correct, but very unhelpful
to me as a programmer. I'm much more interested in the line inside test.cpp
which triggered the call to random.tcc:3332 etc. Is there a way to get that
data? Note how you can a way better backtrace when using GDB on the same
binary as above:
~~~~~~~~~~~~
gdb a.out
break random.tcc:3332
run
Breakpoint 1, std::generate_canonical<double, 53ul,
std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >
(__urng=...) at /usr/include/c++/6.1.1/bits/random.tcc:3332
3332 __sum += _RealType(__urng() - __urng.min()) * __tmp;
(gdb) bt
#0 std::generate_canonical<double, 53ul,
std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >
(__urng=...) at /usr/include/c++/6.1.1/bits/random.tcc:3332
#1 std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long,
16807ul, 0ul, 2147483647ul>, double>::operator() (this=<optimized out>)
at /usr/include/c++/6.1.1/bits/random.h:185
#2
std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned
long, 16807ul, 0ul, 2147483647ul> > (this=<optimized out>, __p=...,
__urng=...)
at /usr/include/c++/6.1.1/bits/random.h:1818
#3
std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned
long, 16807ul, 0ul, 2147483647ul> > (__urng=<synthetic pointer>,
this=<synthetic pointer>)
at /usr/include/c++/6.1.1/bits/random.h:1809
#4 main () at test.cpp:14
~~~~~~~~~~~~
Why is perf (or is it libunwind?) not capable of getting the same backtrace?
######### srcline in perf report backtraces
If we continue with the `perf report` output from the command above, we'll
eventually see:
~~~~~~~~~~~~
4.60% [.] main test.cpp:14
|
---__libc_start_main
_start
3.63% [.] __hypot_finite __hypot_finite+257
|
---hypot
main
__libc_start_main
_start
3.25% [.] hypot hypot+55
|
---main
__libc_start_main
_start
3.21% [.] __hypot_finite __hypot_finite+152
|
---hypot
main
__libc_start_main
_start
2.63% [.] __hypot_finite __hypot_finite+66
|
---hypot
main
... continued
~~~~~~~~~
Here, I have again a few things to note:
On one hand, we have the same issue as above, namely inlined functions being
attributed directly to the parent function. See how the backtrace shows main
calling hypot? Look at the source, I'm not calling hypot anywhere - it's
std::norm calling it internally eventually. And GDB does know about that and
can give me a proper backtrace.
But, differently to above, the major gripe I have with this output is
exemplified by this part:
~~~~~~~~~
3.63% [.] __hypot_finite __hypot_finite+257
|
---hypot
main
__libc_start_main
_start
~~~~~~~~~
I added `srcline` to the report, but the backtrace still only contains the
symbol name. I hope, that we can simply honor srcline there as well, to at
least print something like this instead:
~~~~~~~~~
3.63% [.] __hypot_finite __hypot_finite+257
|
---hypot
main at test.cpp:14 ### <-- srcline here as well please?
__libc_start_main
_start
~~~~~~~~~
Is there a reason why this was not yet implemented? If not, then I'll prepare
a patch for that, as I hope this to be easy to implement.
########## symbol cost aggregation
Once we have that implemented, can we maybe account for the following: I told
`perf report` to aggregate by symbol first, then srcline, i.e. `-s
sym,srcline`. But the report seems to aggregate by `srcline`, because I see
some symbols (__hypot_finite) multiple times, for different code points. Can
we merge those and start the backtrace then at the different code points?
Something like this would be my desirable output:
~~~~~~~~
# see above for what I actually get, here I'm accumulating manually
$ perf report --stdio -s sym,srcline --no-children |grep __hypot_finite+ | awk
'{sum+=$1}END{print sum}'
34.96
# this is what I would like to get directly
$ perf report --stdio -s sym,srcline --no-children
...
34.96% [.] __hypot_finite
|
3.63% | __hypot_finite __hypot_finite+257
- ...
3.21% | __hypot_finite __hypot_finite+152
- ...
...
~~~~~~~~~
What do you think?
####### perf annotate for mere mortals
To conclude this already long email, I want to raise the mess that is `perf
annotate`. If I run it on my above data file, I get the following output:
~~~~~~~~~
1.23 │ mulsd 0x2f2(%rip),%xmm0 # 400bd0 <_IO_stdin_used+0x20>
2.47 │ subsd 0x2f2(%rip),%xmm0 # 400bd8 <_IO_stdin_used+0x28>
│ _ZSt13__complex_absCd():
│ #if _GLIBCXX_USE_C99_COMPLEX
│ inline float
│ __complex_abs(__complex__ float __z) { return
__builtin_cabsf(__z); }
│
│ inline double
│ __complex_abs(__complex__ double __z) { return
__builtin_cabs(__z); }
1.23 │ → callq cabs@plt
│ _ZNSt12_Norm_helperILb1EE8_S_do_itIdEET_RKSt7complexIS2_E():
│ {
│ template<typename _Tp>
│ static inline _Tp _S_do_it(const complex<_Tp>& __z)
│ {
│ _Tp __res = std::abs(__z);
│ return __res * __res;
│ mulsd %xmm0,%xmm0
│ main():
│ for (int i = 0; i < 1000000; ++i) {
1.23 │ sub $0x1,%ebp
│ s += norm(complex<double>(uniform(engine),
uniform(engine)));
7.41 │ addsd 0x8(%rsp),%xmm0
│ movsd %xmm0,0x8(%rsp)
│ int main()
│ {
│ uniform_real_distribution<double> uniform(-1E5, 1E5);
│ default_random_engine engine;
│ double s = 0;
│ for (int i = 0; i < 1000000; ++i) {
│ ↓ je 24e
│
_ZNSt8__detail4_ModImLm2147483647ELm16807ELm0ELb1ELb1EE6__calcEm():
│ struct _Mod<_Tp, __m, __a, __c, true, __s>
│ {
│ static _Tp
│ __calc(_Tp __x)
│ {
│ _Tp __res = __a * __x + __c;
│194: imul $0x41a7,%rbx,%rbx
~~~~~~~
I personally think this output is extremely hard to interpret. I see the value
of the above output when you want to look at the cost of hot computational
kernels, but for most user space applications the above is pretty much
useless. Most notably, note that the above shows self cost only and I could
not find a way to trigger inclusive cost visualization. It makes me believe
that the addition of the complex norms is the most costly operation in my
code. That is not true, the inclusive cost of the random engine as well as
actually computing the norm is way higher, i.e. I see
1.23 │ → callq cabs@plt
but from the output of `perf report` above that triggers the calls to
`__hypot` which have close to 40% of the cost.
As such, I propose a "simplified" `perf annotate` output, which gets closer to
what you'll see in other profilers, e.g. VTune or also Microsoft Visual
studio's sampling profilers: Use the source files and annotate those with
inclusive cost. Only show the cost of individual binary instructions (status
quo) when explicitly asked. I.e. I want the output to look something like that
(I made up the actual percentages):
~~~~~~~~~
#include <complex>
#include <cmath>
#include <random>
#include <iostream>
using namespace std;
int main()
{
1% uniform_real_distribution<double> uniform(-1E5, 1E5);
1% default_random_engine engine;
double s = 0;
1% for (int i = 0; i < 1000000; ++i) {
95% s += norm(complex<double>(uniform(engine), uniform(engine)));
}
2% cout << s << '\n';
return 0;
}
~~~~~~~~~
Bonus points if I could then navigate to a source line and dig deeper (i.e.
see the instructions for this line and then jump to a function that is being
called e.g.).
Thanks a lot for reading this far. Please note that I really like perf, I just
have trouble teaching people using it as many things are really really hard
and I hope we can work together to make perf much easier to use!
Thanks
--
Milian Wolff | milian.wolff@kdab.com | Software Engineer
KDAB (Deutschland) GmbH&Co KG, a KDAB Group company
Tel: +49-30-521325470
KDAB - The Qt Experts
[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5903 bytes --]
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: usability issues with inlining and backtraces
2016-05-23 9:10 usability issues with inlining and backtraces Milian Wolff
@ 2016-05-23 14:11 ` Arnaldo Carvalho de Melo
2016-08-15 11:44 ` Milian Wolff
2016-05-23 14:59 ` Andi Kleen
1 sibling, 1 reply; 11+ messages in thread
From: Arnaldo Carvalho de Melo @ 2016-05-23 14:11 UTC (permalink / raw)
To: Milian Wolff; +Cc: linux-perf-users
Em Mon, May 23, 2016 at 11:10:00AM +0200, Milian Wolff escreveu:
> Hey all,
>
> I've recently encountered some nasty issues with perf which should - I hope -
> be possible to be fixed.
I've quickly read this, all interesting, but I need some time to digest
it fully, which I don't have right now, I'll get to it as soon as I can.
The fact that other open source tools do it as you expect is a precious
observation, as we can go look at how they solved problems.
Yeah, agreed that we need getting these tools to be more useful for
userland developers, and to, as much as we can, connect events connected
to source code.
Hopefully some other people will participate.
- Arnaldo
> First of all, I learned about `perf report -s sym,srcline`, most notably the
> `srcline`, which finally (!) allows me to correlate sample cost directly to
> line numbers, without trying to interpret the often quite convoluted output of
> `perf annotate`. But, take the following example code:
>
> ~~~~~~~~~~~~
> #include <complex>
> #include <cmath>
> #include <random>
> #include <iostream>
>
> using namespace std;
>
> int main()
> {
> uniform_real_distribution<double> uniform(-1E5, 1E5);
> default_random_engine engine;
> double s = 0;
> for (int i = 0; i < 1000000; ++i) {
> s += norm(complex<double>(uniform(engine), uniform(engine)));
> }
> cout << s << '\n';
> return 0;
> }
> ~~~~~~~~~~~~
>
> Compile it with
>
> g++ -std=c++11 -g -O2 test.cpp
>
> Then profile it with perf:
>
> perf record --call-graph dwarf
>
> ########## perf report backtraces in face of inlining
>
> Now let us go through the the output of the report with srcline enabled:
>
> ~~~~~~~~~~~~
> perf report --stdio -s sym,srcline --no-children
>
> 21.07% [.] main random.tcc:3332
> |
> ---__libc_start_main
> _start
> 20.30% [.] main random.h:143
> |
> ---__libc_start_main
> _start
> ~~~~~~~~~~~~
>
> Here, the cost of inlined functions (random.tcc:3332,random.h:143) is
> attributed to the main function. This is of course correct, but very unhelpful
> to me as a programmer. I'm much more interested in the line inside test.cpp
> which triggered the call to random.tcc:3332 etc. Is there a way to get that
> data? Note how you can a way better backtrace when using GDB on the same
> binary as above:
>
> ~~~~~~~~~~~~
> gdb a.out
> break random.tcc:3332
> run
> Breakpoint 1, std::generate_canonical<double, 53ul,
> std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >
> (__urng=...) at /usr/include/c++/6.1.1/bits/random.tcc:3332
> 3332 __sum += _RealType(__urng() - __urng.min()) * __tmp;
> (gdb) bt
> #0 std::generate_canonical<double, 53ul,
> std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >
> (__urng=...) at /usr/include/c++/6.1.1/bits/random.tcc:3332
> #1 std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long,
> 16807ul, 0ul, 2147483647ul>, double>::operator() (this=<optimized out>)
> at /usr/include/c++/6.1.1/bits/random.h:185
> #2
> std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned
> long, 16807ul, 0ul, 2147483647ul> > (this=<optimized out>, __p=...,
> __urng=...)
> at /usr/include/c++/6.1.1/bits/random.h:1818
> #3
> std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned
> long, 16807ul, 0ul, 2147483647ul> > (__urng=<synthetic pointer>,
> this=<synthetic pointer>)
> at /usr/include/c++/6.1.1/bits/random.h:1809
> #4 main () at test.cpp:14
> ~~~~~~~~~~~~
>
> Why is perf (or is it libunwind?) not capable of getting the same backtrace?
>
> ######### srcline in perf report backtraces
>
> If we continue with the `perf report` output from the command above, we'll
> eventually see:
>
> ~~~~~~~~~~~~
>
> 4.60% [.] main test.cpp:14
> |
> ---__libc_start_main
> _start
>
> 3.63% [.] __hypot_finite __hypot_finite+257
> |
> ---hypot
> main
> __libc_start_main
> _start
>
> 3.25% [.] hypot hypot+55
> |
> ---main
> __libc_start_main
> _start
>
> 3.21% [.] __hypot_finite __hypot_finite+152
> |
> ---hypot
> main
> __libc_start_main
> _start
>
> 2.63% [.] __hypot_finite __hypot_finite+66
> |
> ---hypot
> main
> ... continued
> ~~~~~~~~~
>
> Here, I have again a few things to note:
>
> On one hand, we have the same issue as above, namely inlined functions being
> attributed directly to the parent function. See how the backtrace shows main
> calling hypot? Look at the source, I'm not calling hypot anywhere - it's
> std::norm calling it internally eventually. And GDB does know about that and
> can give me a proper backtrace.
>
> But, differently to above, the major gripe I have with this output is
> exemplified by this part:
>
> ~~~~~~~~~
> 3.63% [.] __hypot_finite __hypot_finite+257
> |
> ---hypot
> main
> __libc_start_main
> _start
> ~~~~~~~~~
>
> I added `srcline` to the report, but the backtrace still only contains the
> symbol name. I hope, that we can simply honor srcline there as well, to at
> least print something like this instead:
>
> ~~~~~~~~~
> 3.63% [.] __hypot_finite __hypot_finite+257
> |
> ---hypot
> main at test.cpp:14 ### <-- srcline here as well please?
> __libc_start_main
> _start
> ~~~~~~~~~
>
> Is there a reason why this was not yet implemented? If not, then I'll prepare
> a patch for that, as I hope this to be easy to implement.
>
> ########## symbol cost aggregation
>
> Once we have that implemented, can we maybe account for the following: I told
> `perf report` to aggregate by symbol first, then srcline, i.e. `-s
> sym,srcline`. But the report seems to aggregate by `srcline`, because I see
> some symbols (__hypot_finite) multiple times, for different code points. Can
> we merge those and start the backtrace then at the different code points?
> Something like this would be my desirable output:
>
> ~~~~~~~~
> # see above for what I actually get, here I'm accumulating manually
> $ perf report --stdio -s sym,srcline --no-children |grep __hypot_finite+ | awk
> '{sum+=$1}END{print sum}'
> 34.96
> # this is what I would like to get directly
> $ perf report --stdio -s sym,srcline --no-children
> ...
> 34.96% [.] __hypot_finite
> |
> 3.63% | __hypot_finite __hypot_finite+257
> - ...
> 3.21% | __hypot_finite __hypot_finite+152
> - ...
> ...
> ~~~~~~~~~
>
> What do you think?
>
> ####### perf annotate for mere mortals
>
> To conclude this already long email, I want to raise the mess that is `perf
> annotate`. If I run it on my above data file, I get the following output:
>
> ~~~~~~~~~
> 1.23 │ mulsd 0x2f2(%rip),%xmm0 # 400bd0 <_IO_stdin_used+0x20>
> 2.47 │ subsd 0x2f2(%rip),%xmm0 # 400bd8 <_IO_stdin_used+0x28>
> │ _ZSt13__complex_absCd():
> │ #if _GLIBCXX_USE_C99_COMPLEX
> │ inline float
> │ __complex_abs(__complex__ float __z) { return
> __builtin_cabsf(__z); }
> │
> │ inline double
> │ __complex_abs(__complex__ double __z) { return
> __builtin_cabs(__z); }
> 1.23 │ → callq cabs@plt
> │ _ZNSt12_Norm_helperILb1EE8_S_do_itIdEET_RKSt7complexIS2_E():
> │ {
> │ template<typename _Tp>
> │ static inline _Tp _S_do_it(const complex<_Tp>& __z)
> │ {
> │ _Tp __res = std::abs(__z);
> │ return __res * __res;
> │ mulsd %xmm0,%xmm0
> │ main():
> │ for (int i = 0; i < 1000000; ++i) {
> 1.23 │ sub $0x1,%ebp
> │ s += norm(complex<double>(uniform(engine),
> uniform(engine)));
> 7.41 │ addsd 0x8(%rsp),%xmm0
> │ movsd %xmm0,0x8(%rsp)
> │ int main()
> │ {
> │ uniform_real_distribution<double> uniform(-1E5, 1E5);
> │ default_random_engine engine;
> │ double s = 0;
> │ for (int i = 0; i < 1000000; ++i) {
> │ ↓ je 24e
> │
> _ZNSt8__detail4_ModImLm2147483647ELm16807ELm0ELb1ELb1EE6__calcEm():
> │ struct _Mod<_Tp, __m, __a, __c, true, __s>
> │ {
> │ static _Tp
> │ __calc(_Tp __x)
> │ {
> │ _Tp __res = __a * __x + __c;
> │194: imul $0x41a7,%rbx,%rbx
>
> ~~~~~~~
>
> I personally think this output is extremely hard to interpret. I see the value
> of the above output when you want to look at the cost of hot computational
> kernels, but for most user space applications the above is pretty much
> useless. Most notably, note that the above shows self cost only and I could
> not find a way to trigger inclusive cost visualization. It makes me believe
> that the addition of the complex norms is the most costly operation in my
> code. That is not true, the inclusive cost of the random engine as well as
> actually computing the norm is way higher, i.e. I see
>
> 1.23 │ → callq cabs@plt
>
> but from the output of `perf report` above that triggers the calls to
> `__hypot` which have close to 40% of the cost.
>
> As such, I propose a "simplified" `perf annotate` output, which gets closer to
> what you'll see in other profilers, e.g. VTune or also Microsoft Visual
> studio's sampling profilers: Use the source files and annotate those with
> inclusive cost. Only show the cost of individual binary instructions (status
> quo) when explicitly asked. I.e. I want the output to look something like that
> (I made up the actual percentages):
>
> ~~~~~~~~~
> #include <complex>
> #include <cmath>
> #include <random>
> #include <iostream>
>
> using namespace std;
>
> int main()
> {
> 1% uniform_real_distribution<double> uniform(-1E5, 1E5);
> 1% default_random_engine engine;
> double s = 0;
> 1% for (int i = 0; i < 1000000; ++i) {
> 95% s += norm(complex<double>(uniform(engine), uniform(engine)));
> }
> 2% cout << s << '\n';
> return 0;
> }
> ~~~~~~~~~
>
> Bonus points if I could then navigate to a source line and dig deeper (i.e.
> see the instructions for this line and then jump to a function that is being
> called e.g.).
>
> Thanks a lot for reading this far. Please note that I really like perf, I just
> have trouble teaching people using it as many things are really really hard
> and I hope we can work together to make perf much easier to use!
>
> Thanks
> --
> Milian Wolff | milian.wolff@kdab.com | Software Engineer
> KDAB (Deutschland) GmbH&Co KG, a KDAB Group company
> Tel: +49-30-521325470
> KDAB - The Qt Experts
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: usability issues with inlining and backtraces
2016-05-23 14:11 ` Arnaldo Carvalho de Melo
@ 2016-08-15 11:44 ` Milian Wolff
0 siblings, 0 replies; 11+ messages in thread
From: Milian Wolff @ 2016-08-15 11:44 UTC (permalink / raw)
To: acme; +Cc: linux-perf-users
[-- Attachment #1: Type: text/plain, Size: 2513 bytes --]
On Monday, May 23, 2016 11:11:04 AM CEST Arnaldo Carvalho de Melo wrote:
> Em Mon, May 23, 2016 at 11:10:00AM +0200, Milian Wolff escreveu:
> > Hey all,
> >
> > I've recently encountered some nasty issues with perf which should - I
> > hope - be possible to be fixed.
>
> I've quickly read this, all interesting, but I need some time to digest
> it fully, which I don't have right now, I'll get to it as soon as I can.
>
> The fact that other open source tools do it as you expect is a precious
> observation, as we can go look at how they solved problems.
I have now found something very interesting in the perf code base:
The addr2line implementation with HAVE_LIBBFD_SUPPORT is actually capable of
producing backtraces that include inliners, similar to what GDB can do. If I
add debug output to the bfd_find_inliner_info loop inside addr2line in
srcline.c, and force unwind_inlines to true, then I see e.g.:
~~~~~~~~~~
inline: 0 _ZNSt8__detail4_ModImLm2147483647ELm16807ELm0ELb1ELb1EE6__calcEm (/
usr/include/c++/6.1.1/bits/random.h:143)
inline: 1 _ZNSt8__detail5__modImLm2147483647ELm16807ELm0EEET_S1_ (/usr/
include/c++/6.1.1/bits/random.h:151)
inline: 2 _ZNSt26linear_congruential_engineImLm16807ELm0ELm2147483647EEclEv (/
usr/include/c++/6.1.1/bits/random.h:332)
inline: 3
_ZSt18generate_canonicalIdLm53ESt26linear_congruential_engineImLm16807ELm0ELm2147483647EEET_RT1_
(/usr/include/c++/6.1.1/bits/random.tcc:3332)
inline: 4
_ZNSt8__detail8_AdaptorISt26linear_congruential_engineImLm16807ELm0ELm2147483647EEdEclEv
(/usr/include/c++/6.1.1/bits/random.h:185)
inline: 5
_ZNSt25uniform_real_distributionIdEclISt26linear_congruential_engineImLm16807ELm0ELm2147483647EEEEdRT_RKNS0_10param_typeE
(/usr/include/c++/6.1.1/bits/random.h:1818)
inline: 6
_ZNSt25uniform_real_distributionIdEclISt26linear_congruential_engineImLm16807ELm0ELm2147483647EEEEdRT_
(/usr/include/c++/6.1.1/bits/random.h:1809)
inline: 7 main (/path/to/test.cpp:14)
~~~~~~~~~~
I would really like to see this information be included in the call graph
shown by perf, e.g. in the stdio/browser output of perf report, but also in
perf script etc. pp.
Can someone suggest a way to do that? These entries cannot be represented by a
normal instruction pointer address, so extending the existing callchains with
additional entries is not straight forward to do.
Thanks
--
Milian Wolff | milian.wolff@kdab.com | Software Engineer
KDAB (Deutschland) GmbH&Co KG, a KDAB Group company
Tel: +49-30-521325470
KDAB - The Qt Experts
[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5903 bytes --]
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: usability issues with inlining and backtraces
2016-05-23 9:10 usability issues with inlining and backtraces Milian Wolff
2016-05-23 14:11 ` Arnaldo Carvalho de Melo
@ 2016-05-23 14:59 ` Andi Kleen
2016-05-23 15:18 ` Milian Wolff
1 sibling, 1 reply; 11+ messages in thread
From: Andi Kleen @ 2016-05-23 14:59 UTC (permalink / raw)
To: Milian Wolff; +Cc: linux-perf-users
Milian Wolff <milian.wolff@kdab.com> writes:
> Here, the cost of inlined functions (random.tcc:3332,random.h:143) is
> attributed to the main function. This is of course correct, but very unhelpful
> to me as a programmer. I'm much more interested in the line inside test.cpp
> which triggered the call to random.tcc:3332 etc. Is there a way to get that
> data? Note how you can a way better backtrace when using GDB on the same
> binary as above:
Yes this works using --call-graph ...,address
If you use srcfile it can resolve inlines. it is currently not supported
for other backtraces, but yes expanding them in the normal history
would be useful.
> On one hand, we have the same issue as above, namely inlined functions being
> attributed directly to the parent function. See how the backtrace shows main
> calling hypot? Look at the source, I'm not calling hypot anywhere - it's
> std::norm calling it internally eventually. And GDB does know about that and
> can give me a proper backtrace.
Use srcfile then.
>
> But, differently to above, the major gripe I have with this output is
> exemplified by this part:
>
> ~~~~~~~~~
> 3.63% [.] __hypot_finite __hypot_finite+257
> |
> ---hypot
> main
> __libc_start_main
> _start
> ~~~~~~~~~
>
> I added `srcline` to the report, but the backtrace still only contains the
> symbol name. I hope, that we can simply honor srcline there as well, to at
> least print something like this instead:
--call-graph ....,address
(or --branch-history)
> ########## symbol cost aggregation
>
> Once we have that implemented, can we maybe account for the following: I told
> `perf report` to aggregate by symbol first, then srcline, i.e. `-s
> sym,srcline`. But the report seems to aggregate by `srcline`, because I see
> some symbols (__hypot_finite) multiple times, for different code points. Can
> we merge those and start the backtrace then at the different code points?
> Something like this would be my desirable output:
perf sorts by address, but the compiler can generate the same symbol
in multiple versions or in may inlines instances. That's probably
more work to fix.
> As such, I propose a "simplified" `perf annotate` output, which gets closer to
> what you'll see in other profilers, e.g. VTune or also Microsoft Visual
> studio's sampling profilers: Use the source files and annotate those with
> inclusive cost. Only show the cost of individual binary instructions (status
> quo) when explicitly asked. I.e. I want the output to look something like that
> (I made up the actual percentages):
Sounds like the fallacy of abstracted performance analysis.
-Andi
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: usability issues with inlining and backtraces
2016-05-23 14:59 ` Andi Kleen
@ 2016-05-23 15:18 ` Milian Wolff
2016-06-13 16:07 ` Milian Wolff
0 siblings, 1 reply; 11+ messages in thread
From: Milian Wolff @ 2016-05-23 15:18 UTC (permalink / raw)
To: Andi Kleen; +Cc: linux-perf-users
[-- Attachment #1: Type: text/plain, Size: 4987 bytes --]
On Monday, May 23, 2016 7:59:54 AM CEST Andi Kleen wrote:
> Milian Wolff <milian.wolff@kdab.com> writes:
> > Here, the cost of inlined functions (random.tcc:3332,random.h:143) is
> > attributed to the main function. This is of course correct, but very
> > unhelpful to me as a programmer. I'm much more interested in the line
> > inside test.cpp which triggered the call to random.tcc:3332 etc. Is there
> > a way to get that data? Note how you can a way better backtrace when
> > using GDB on the same
> > binary as above:
>
> Yes this works using --call-graph ...,address
>
> If you use srcfile it can resolve inlines. it is currently not supported
> for other backtraces, but yes expanding them in the normal history
> would be useful.
Can you expand on this part a bit please? It's the first time I realize that I
can add a sort key to `--call-graph`, so at the very least this shows that the
defaults are bad.
For the example I'm now trying this:
$ perf report -g graph,address -s sym,srcline --no-children --stdio
# Overhead Symbol Source:Line
# ........ ........................... ..................
#
21.82% [.] main random.tcc:3332
|
|--3.84%--main +8388740
| __libc_start_main +140389813309681
| _start +8388649
| ...
6.29% [.] __hypot_finite __hypot_finite+152
|
---hypot +140389819301908
main +8388987
__libc_start_main +140389813309681
_start +8388649
-> much better, but still unusable backtraces. And my `perf report` only
allows this:
sort_key: call graph sort key (function|address)
So we need to add support for "|srcline" here as well?
> > On one hand, we have the same issue as above, namely inlined functions
> > being attributed directly to the parent function. See how the backtrace
> > shows main calling hypot? Look at the source, I'm not calling hypot
> > anywhere - it's std::norm calling it internally eventually. And GDB does
> > know about that and can give me a proper backtrace.
>
> Use srcfile then.
See above, can you please extend this answer a bit? How would you use srcfile,
and where, to get the full backtrace? Is that maybe a patch that is not yet
included in acme's perf/core branch?
> > But, differently to above, the major gripe I have with this output is
> > exemplified by this part:
> >
> > ~~~~~~~~~
> >
> > 3.63% [.] __hypot_finite __hypot_finite+257
> >
> > ---hypot
> >
> > main
> > __libc_start_main
> > _start
> >
> > ~~~~~~~~~
> >
> > I added `srcline` to the report, but the backtrace still only contains the
> > symbol name. I hope, that we can simply honor srcline there as well, to at
>
> > least print something like this instead:
> --call-graph ....,address
>
> (or --branch-history)
Both produce output like I've shown above, i.e. they always put down
<symbol> +<offset>
insetead of using the much more useable
<symbol> <file>:<line>
format. What am I missing?
> > ########## symbol cost aggregation
> >
> > Once we have that implemented, can we maybe account for the following: I
> > told `perf report` to aggregate by symbol first, then srcline, i.e. `-s
> > sym,srcline`. But the report seems to aggregate by `srcline`, because I
> > see some symbols (__hypot_finite) multiple times, for different code
> > points. Can we merge those and start the backtrace then at the different
> > code points?
> > Something like this would be my desirable output:
>
> perf sorts by address, but the compiler can generate the same symbol
> in multiple versions or in may inlines instances. That's probably
> more work to fix.
>
> > As such, I propose a "simplified" `perf annotate` output, which gets
> > closer to what you'll see in other profilers, e.g. VTune or also
> > Microsoft Visual studio's sampling profilers: Use the source files and
> > annotate those with inclusive cost. Only show the cost of individual
> > binary instructions (status quo) when explicitly asked. I.e. I want the
> > output to look something like that
> > (I made up the actual percentages):
>
> Sounds like the fallacy of abstracted performance analysis.
Can you please elaborate that some more as well? For me, as a user-space
application developer, getting an aggregated inclusive cost for the source
code I actually write is extremely useful. Of course, when it's not directly
clear from that view why something is slow I still need to dig deeper and get
to the current in-depth assembly view. But why is having a simplified overview
a fallacy? Why is this abstracted performance analysis?
Thanks a lot for your input!
--
Milian Wolff | milian.wolff@kdab.com | Software Engineer
KDAB (Deutschland) GmbH&Co KG, a KDAB Group company
Tel: +49-30-521325470
KDAB - The Qt Experts
[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5903 bytes --]
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: usability issues with inlining and backtraces
2016-05-23 15:18 ` Milian Wolff
@ 2016-06-13 16:07 ` Milian Wolff
2016-08-15 9:32 ` Milian Wolff
0 siblings, 1 reply; 11+ messages in thread
From: Milian Wolff @ 2016-06-13 16:07 UTC (permalink / raw)
To: andi; +Cc: linux-perf-users
[-- Attachment #1: Type: text/plain, Size: 5475 bytes --]
Ping?
Andi, I'd be very interested in your answers on my questions below. Most
notably, `srcline` with `--call-graph ...,address` seems to be highly buggy
for me.
Thanks
On Monday, May 23, 2016 5:18:53 PM CEST Milian Wolff wrote:
> On Monday, May 23, 2016 7:59:54 AM CEST Andi Kleen wrote:
> > Milian Wolff <milian.wolff@kdab.com> writes:
> > > Here, the cost of inlined functions (random.tcc:3332,random.h:143) is
> > > attributed to the main function. This is of course correct, but very
> > > unhelpful to me as a programmer. I'm much more interested in the line
> > > inside test.cpp which triggered the call to random.tcc:3332 etc. Is
> > > there
> > > a way to get that data? Note how you can a way better backtrace when
> > > using GDB on the same
> >
> > > binary as above:
> > Yes this works using --call-graph ...,address
> >
> > If you use srcfile it can resolve inlines. it is currently not supported
> > for other backtraces, but yes expanding them in the normal history
> > would be useful.
>
> Can you expand on this part a bit please? It's the first time I realize that
> I can add a sort key to `--call-graph`, so at the very least this shows
> that the defaults are bad.
>
> For the example I'm now trying this:
>
> $ perf report -g graph,address -s sym,srcline --no-children --stdio
>
> # Overhead Symbol Source:Line
> # ........ ........................... ..................
> #
> 21.82% [.] main random.tcc:3332
>
> |--3.84%--main +8388740
> |
> | __libc_start_main +140389813309681
> | _start +8388649
> |
> | ...
>
> 6.29% [.] __hypot_finite __hypot_finite+152
>
> ---hypot +140389819301908
> main +8388987
> __libc_start_main +140389813309681
> _start +8388649
>
> -> much better, but still unusable backtraces. And my `perf report` only
> allows this:
>
> sort_key: call graph sort key (function|address)
>
> So we need to add support for "|srcline" here as well?
>
> > > On one hand, we have the same issue as above, namely inlined functions
> > > being attributed directly to the parent function. See how the backtrace
> > > shows main calling hypot? Look at the source, I'm not calling hypot
> > > anywhere - it's std::norm calling it internally eventually. And GDB does
> > > know about that and can give me a proper backtrace.
> >
> > Use srcfile then.
>
> See above, can you please extend this answer a bit? How would you use
> srcfile, and where, to get the full backtrace? Is that maybe a patch that
> is not yet included in acme's perf/core branch?
>
> > > But, differently to above, the major gripe I have with this output is
> > > exemplified by this part:
> > >
> > > ~~~~~~~~~
> > >
> > > 3.63% [.] __hypot_finite __hypot_finite+257
> > >
> > > ---hypot
> > >
> > > main
> > > __libc_start_main
> > > _start
> > >
> > > ~~~~~~~~~
> > >
> > > I added `srcline` to the report, but the backtrace still only contains
> > > the
> > > symbol name. I hope, that we can simply honor srcline there as well, to
> > > at
> >
> > > least print something like this instead:
> > --call-graph ....,address
> >
> > (or --branch-history)
>
> Both produce output like I've shown above, i.e. they always put down
>
> <symbol> +<offset>
>
> insetead of using the much more useable
>
> <symbol> <file>:<line>
>
> format. What am I missing?
>
> > > ########## symbol cost aggregation
> > >
> > > Once we have that implemented, can we maybe account for the following: I
> > > told `perf report` to aggregate by symbol first, then srcline, i.e. `-s
> > > sym,srcline`. But the report seems to aggregate by `srcline`, because I
> > > see some symbols (__hypot_finite) multiple times, for different code
> > > points. Can we merge those and start the backtrace then at the different
> > > code points?
> >
> > > Something like this would be my desirable output:
> > perf sorts by address, but the compiler can generate the same symbol
> > in multiple versions or in may inlines instances. That's probably
> > more work to fix.
> >
> > > As such, I propose a "simplified" `perf annotate` output, which gets
> > > closer to what you'll see in other profilers, e.g. VTune or also
> > > Microsoft Visual studio's sampling profilers: Use the source files and
> > > annotate those with inclusive cost. Only show the cost of individual
> > > binary instructions (status quo) when explicitly asked. I.e. I want the
> > > output to look something like that
> >
> > > (I made up the actual percentages):
> > Sounds like the fallacy of abstracted performance analysis.
>
> Can you please elaborate that some more as well? For me, as a user-space
> application developer, getting an aggregated inclusive cost for the source
> code I actually write is extremely useful. Of course, when it's not directly
> clear from that view why something is slow I still need to dig deeper and
> get to the current in-depth assembly view. But why is having a simplified
> overview a fallacy? Why is this abstracted performance analysis?
>
> Thanks a lot for your input!
--
Milian Wolff | milian.wolff@kdab.com | Software Engineer
KDAB (Deutschland) GmbH&Co KG, a KDAB Group company
Tel: +49-30-521325470
KDAB - The Qt Experts
[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5903 bytes --]
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: usability issues with inlining and backtraces
2016-06-13 16:07 ` Milian Wolff
@ 2016-08-15 9:32 ` Milian Wolff
2016-08-15 11:32 ` Milian Wolff
0 siblings, 1 reply; 11+ messages in thread
From: Milian Wolff @ 2016-08-15 9:32 UTC (permalink / raw)
To: andi; +Cc: linux-perf-users
[-- Attachment #1: Type: text/plain, Size: 6152 bytes --]
On Monday, June 13, 2016 6:07:13 PM CEST Milian Wolff wrote:
> Ping?
>
> Andi, I'd be very interested in your answers on my questions below. Most
> notably, `srcline` with `--call-graph ...,address` seems to be highly buggy
> for me.
Reping again.
Andi, or anyone else - I would really appreciate some more feedback on my
questions below.
I esp. wonder how to leverage srcline and addresses in the call-graph, as that
has the potential to greatly increase perf's value to me.
Thanks
> On Monday, May 23, 2016 5:18:53 PM CEST Milian Wolff wrote:
> > On Monday, May 23, 2016 7:59:54 AM CEST Andi Kleen wrote:
> > > Milian Wolff <milian.wolff@kdab.com> writes:
> > > > Here, the cost of inlined functions (random.tcc:3332,random.h:143) is
> > > > attributed to the main function. This is of course correct, but very
> > > > unhelpful to me as a programmer. I'm much more interested in the line
> > > > inside test.cpp which triggered the call to random.tcc:3332 etc. Is
> > > > there
> > > > a way to get that data? Note how you can a way better backtrace when
> > > > using GDB on the same
> > >
> > > > binary as above:
> > > Yes this works using --call-graph ...,address
> > >
> > > If you use srcfile it can resolve inlines. it is currently not supported
> > > for other backtraces, but yes expanding them in the normal history
> > > would be useful.
> >
> > Can you expand on this part a bit please? It's the first time I realize
> > that I can add a sort key to `--call-graph`, so at the very least this
> > shows that the defaults are bad.
> >
> > For the example I'm now trying this:
> > $ perf report -g graph,address -s sym,srcline --no-children --stdio
> >
> > # Overhead Symbol Source:Line
> > # ........ ........................... ..................
> > #
> >
> > 21.82% [.] main random.tcc:3332
> >
> > |--3.84%--main +8388740
> > |
> > | __libc_start_main +140389813309681
> > | _start +8388649
> > |
> > | ...
> >
> > 6.29% [.] __hypot_finite __hypot_finite+152
> >
> > ---hypot +140389819301908
> >
> > main +8388987
> > __libc_start_main +140389813309681
> > _start +8388649
> >
> > -> much better, but still unusable backtraces. And my `perf report` only
> >
> > allows this:
> > sort_key: call graph sort key (function|address)
> >
> > So we need to add support for "|srcline" here as well?
> >
> > > > On one hand, we have the same issue as above, namely inlined functions
> > > > being attributed directly to the parent function. See how the
> > > > backtrace
> > > > shows main calling hypot? Look at the source, I'm not calling hypot
> > > > anywhere - it's std::norm calling it internally eventually. And GDB
> > > > does
> > > > know about that and can give me a proper backtrace.
> > >
> > > Use srcfile then.
> >
> > See above, can you please extend this answer a bit? How would you use
> > srcfile, and where, to get the full backtrace? Is that maybe a patch that
> > is not yet included in acme's perf/core branch?
> >
> > > > But, differently to above, the major gripe I have with this output is
> > > > exemplified by this part:
> > > >
> > > > ~~~~~~~~~
> > > >
> > > > 3.63% [.] __hypot_finite __hypot_finite+257
> > > >
> > > > ---hypot
> > > >
> > > > main
> > > > __libc_start_main
> > > > _start
> > > >
> > > > ~~~~~~~~~
> > > >
> > > > I added `srcline` to the report, but the backtrace still only contains
> > > > the
> > > > symbol name. I hope, that we can simply honor srcline there as well,
> > > > to
> > > > at
> > >
> > > > least print something like this instead:
> > > --call-graph ....,address
> > >
> > > (or --branch-history)
> >
> > Both produce output like I've shown above, i.e. they always put down
> >
> > <symbol> +<offset>
> >
> > insetead of using the much more useable
> >
> > <symbol> <file>:<line>
> >
> > format. What am I missing?
> >
> > > > ########## symbol cost aggregation
> > > >
> > > > Once we have that implemented, can we maybe account for the following:
> > > > I
> > > > told `perf report` to aggregate by symbol first, then srcline, i.e.
> > > > `-s
> > > > sym,srcline`. But the report seems to aggregate by `srcline`, because
> > > > I
> > > > see some symbols (__hypot_finite) multiple times, for different code
> > > > points. Can we merge those and start the backtrace then at the
> > > > different
> > > > code points?
> > >
> > > > Something like this would be my desirable output:
> > > perf sorts by address, but the compiler can generate the same symbol
> > > in multiple versions or in may inlines instances. That's probably
> > > more work to fix.
> > >
> > > > As such, I propose a "simplified" `perf annotate` output, which gets
> > > > closer to what you'll see in other profilers, e.g. VTune or also
> > > > Microsoft Visual studio's sampling profilers: Use the source files and
> > > > annotate those with inclusive cost. Only show the cost of individual
> > > > binary instructions (status quo) when explicitly asked. I.e. I want
> > > > the
> > > > output to look something like that
> > >
> > > > (I made up the actual percentages):
> > > Sounds like the fallacy of abstracted performance analysis.
> >
> > Can you please elaborate that some more as well? For me, as a user-space
> > application developer, getting an aggregated inclusive cost for the source
> > code I actually write is extremely useful. Of course, when it's not
> > directly clear from that view why something is slow I still need to dig
> > deeper and get to the current in-depth assembly view. But why is having a
> > simplified overview a fallacy? Why is this abstracted performance
> > analysis?
> >
> > Thanks a lot for your input!
--
Milian Wolff | milian.wolff@kdab.com | Software Engineer
KDAB (Deutschland) GmbH&Co KG, a KDAB Group company
Tel: +49-30-521325470
KDAB - The Qt Experts
[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5903 bytes --]
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: usability issues with inlining and backtraces
2016-08-15 9:32 ` Milian Wolff
@ 2016-08-15 11:32 ` Milian Wolff
2016-08-15 17:13 ` Andi Kleen
0 siblings, 1 reply; 11+ messages in thread
From: Milian Wolff @ 2016-08-15 11:32 UTC (permalink / raw)
To: linux-perf-users
[-- Attachment #1: Type: text/plain, Size: 1633 bytes --]
On Monday, August 15, 2016 11:32:33 AM CEST Milian Wolff wrote:
> On Monday, June 13, 2016 6:07:13 PM CEST Milian Wolff wrote:
> > Ping?
> >
> > Andi, I'd be very interested in your answers on my questions below. Most
> > notably, `srcline` with `--call-graph ...,address` seems to be highly
> > buggy
> > for me.
>
> Reping again.
>
> Andi, or anyone else - I would really appreciate some more feedback on my
> questions below.
>
> I esp. wonder how to leverage srcline and addresses in the call-graph, as
> that has the potential to greatly increase perf's value to me.
OK, I have found at least two culprits that partially explain the behavior.
On one hand, I'm simply missing debug information for some libraries, such as
libm, libc etc. pp. Due to that, the addr2line step fails and sym+offset is
displayed.
On the other hand, there seems to be a bug in perf:
When hist_entry__get_srcline calls get_srcline I see from debug output
he->ip=307e, sym=main -> rip_obj2dump=40307e -> mandelbrot.h:39
But in callchain_list__sym_name, I see:
cl->ip=40373d, sym=main -> rip_obj2dump=80373d-> main+8390413
So it seems like the callchain_list is missing a normalization step that is
applied to the IP addresses stored in hist_entry. Can someone point me into
the right direction?
If I change callchain_list__sym_name to pass cl->ip - cl->ms.map->start to
map__rip_2objdump this feature works much better for me, but I doubt that this
is the correct fix.
Thanks
--
Milian Wolff | milian.wolff@kdab.com | Software Engineer
KDAB (Deutschland) GmbH&Co KG, a KDAB Group company
Tel: +49-30-521325470
KDAB - The Qt Experts
[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5903 bytes --]
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: usability issues with inlining and backtraces
2016-08-15 11:32 ` Milian Wolff
@ 2016-08-15 17:13 ` Andi Kleen
2016-08-16 15:13 ` Namhyung Kim
0 siblings, 1 reply; 11+ messages in thread
From: Andi Kleen @ 2016-08-15 17:13 UTC (permalink / raw)
To: Milian Wolff; +Cc: jolsa, namhyung, linux-perf-users
Milian Wolff <milian.wolff@kdab.com> writes:
> On Monday, August 15, 2016 11:32:33 AM CEST Milian Wolff wrote:
>> On Monday, June 13, 2016 6:07:13 PM CEST Milian Wolff wrote:
>> > Ping?
>> >
>> > Andi, I'd be very interested in your answers on my questions below. Most
>> > notably, `srcline` with `--call-graph ...,address` seems to be highly
>> > buggy
>> > for me.
>>
>> Reping again.
>>
>> Andi, or anyone else - I would really appreciate some more feedback on my
>> questions below.
>>
>> I esp. wonder how to leverage srcline and addresses in the call-graph, as
>> that has the potential to greatly increase perf's value to me.
>
> OK, I have found at least two culprits that partially explain the behavior.
>
> On one hand, I'm simply missing debug information for some libraries, such as
> libm, libc etc. pp. Due to that, the addr2line step fails and sym+offset is
> displayed.
>
> On the other hand, there seems to be a bug in perf:
>
> When hist_entry__get_srcline calls get_srcline I see from debug output
>
> he->ip=307e, sym=main -> rip_obj2dump=40307e -> mandelbrot.h:39
>
> But in callchain_list__sym_name, I see:
>
> cl->ip=40373d, sym=main -> rip_obj2dump=80373d-> main+8390413
>
> So it seems like the callchain_list is missing a normalization step that is
> applied to the IP addresses stored in hist_entry. Can someone point me into
> the right direction?
>
> If I change callchain_list__sym_name to pass cl->ip - cl->ms.map->start to
> map__rip_2objdump this feature works much better for me, but I doubt that this
> is the correct fix.
Yes it's something like that. Namhyung or Jiri probably know
what the right fix would be.
-Andi
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: usability issues with inlining and backtraces
2016-08-15 17:13 ` Andi Kleen
@ 2016-08-16 15:13 ` Namhyung Kim
2016-08-16 15:45 ` Milian Wolff
0 siblings, 1 reply; 11+ messages in thread
From: Namhyung Kim @ 2016-08-16 15:13 UTC (permalink / raw)
To: Andi Kleen; +Cc: Milian Wolff, jolsa, linux-perf-users
On Mon, Aug 15, 2016 at 10:13:24AM -0700, Andi Kleen wrote:
> Milian Wolff <milian.wolff@kdab.com> writes:
>
> > On Monday, August 15, 2016 11:32:33 AM CEST Milian Wolff wrote:
> >> On Monday, June 13, 2016 6:07:13 PM CEST Milian Wolff wrote:
> >> > Ping?
> >> >
> >> > Andi, I'd be very interested in your answers on my questions below. Most
> >> > notably, `srcline` with `--call-graph ...,address` seems to be highly
> >> > buggy
> >> > for me.
> >>
> >> Reping again.
> >>
> >> Andi, or anyone else - I would really appreciate some more feedback on my
> >> questions below.
> >>
> >> I esp. wonder how to leverage srcline and addresses in the call-graph, as
> >> that has the potential to greatly increase perf's value to me.
> >
> > OK, I have found at least two culprits that partially explain the behavior.
> >
> > On one hand, I'm simply missing debug information for some libraries, such as
> > libm, libc etc. pp. Due to that, the addr2line step fails and sym+offset is
> > displayed.
> >
> > On the other hand, there seems to be a bug in perf:
> >
> > When hist_entry__get_srcline calls get_srcline I see from debug output
> >
> > he->ip=307e, sym=main -> rip_obj2dump=40307e -> mandelbrot.h:39
> >
> > But in callchain_list__sym_name, I see:
> >
> > cl->ip=40373d, sym=main -> rip_obj2dump=80373d-> main+8390413
> >
> > So it seems like the callchain_list is missing a normalization step that is
> > applied to the IP addresses stored in hist_entry. Can someone point me into
> > the right direction?
> >
> > If I change callchain_list__sym_name to pass cl->ip - cl->ms.map->start to
> > map__rip_2objdump this feature works much better for me, but I doubt that this
> > is the correct fix.
>
> Yes it's something like that. Namhyung or Jiri probably know
> what the right fix would be.
I think the unwind libs should set entry->ip to al.addr instead of
ip. The __hists__add_entry() sets hist_entry->ip using al->addr.
Thanks,
Namhyung
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: usability issues with inlining and backtraces
2016-08-16 15:13 ` Namhyung Kim
@ 2016-08-16 15:45 ` Milian Wolff
0 siblings, 0 replies; 11+ messages in thread
From: Milian Wolff @ 2016-08-16 15:45 UTC (permalink / raw)
To: linux-perf-users; +Cc: Namhyung Kim
[-- Attachment #1: Type: text/plain, Size: 2376 bytes --]
On Mittwoch, 17. August 2016 00:13:10 CEST Namhyung Kim wrote:
> On Mon, Aug 15, 2016 at 10:13:24AM -0700, Andi Kleen wrote:
> > Milian Wolff <milian.wolff@kdab.com> writes:
> > > On Monday, August 15, 2016 11:32:33 AM CEST Milian Wolff wrote:
> > >> On Monday, June 13, 2016 6:07:13 PM CEST Milian Wolff wrote:
> > >> > Ping?
> > >> >
> > >> > Andi, I'd be very interested in your answers on my questions below.
> > >> > Most
> > >> > notably, `srcline` with `--call-graph ...,address` seems to be highly
> > >> > buggy
> > >> > for me.
> > >>
> > >> Reping again.
> > >>
> > >> Andi, or anyone else - I would really appreciate some more feedback on
> > >> my
> > >> questions below.
> > >>
> > >> I esp. wonder how to leverage srcline and addresses in the call-graph,
> > >> as
> > >> that has the potential to greatly increase perf's value to me.
> > >
> > > OK, I have found at least two culprits that partially explain the
> > > behavior.
> > >
> > > On one hand, I'm simply missing debug information for some libraries,
> > > such as libm, libc etc. pp. Due to that, the addr2line step fails and
> > > sym+offset is displayed.
> > >
> > > On the other hand, there seems to be a bug in perf:
> > >
> > > When hist_entry__get_srcline calls get_srcline I see from debug output
> > >
> > > he->ip=307e, sym=main -> rip_obj2dump=40307e -> mandelbrot.h:39
> > >
> > > But in callchain_list__sym_name, I see:
> > >
> > > cl->ip=40373d, sym=main -> rip_obj2dump=80373d-> main+8390413
> > >
> > > So it seems like the callchain_list is missing a normalization step that
> > > is
> > > applied to the IP addresses stored in hist_entry. Can someone point me
> > > into
> > > the right direction?
> > >
> > > If I change callchain_list__sym_name to pass cl->ip - cl->ms.map->start
> > > to
> > > map__rip_2objdump this feature works much better for me, but I doubt
> > > that this is the correct fix.
> >
> > Yes it's something like that. Namhyung or Jiri probably know
> > what the right fix would be.
>
> I think the unwind libs should set entry->ip to al.addr instead of
> ip. The __hists__add_entry() sets hist_entry->ip using al->addr.
Thank you, that works. I've submitted a patch accordingly.
Cheers
--
Milian Wolff | milian.wolff@kdab.com | Software Engineer
KDAB (Deutschland) GmbH&Co KG, a KDAB Group company
Tel: +49-30-521325470
KDAB - The Qt Experts
[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5903 bytes --]
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2016-08-16 15:45 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-05-23 9:10 usability issues with inlining and backtraces Milian Wolff
2016-05-23 14:11 ` Arnaldo Carvalho de Melo
2016-08-15 11:44 ` Milian Wolff
2016-05-23 14:59 ` Andi Kleen
2016-05-23 15:18 ` Milian Wolff
2016-06-13 16:07 ` Milian Wolff
2016-08-15 9:32 ` Milian Wolff
2016-08-15 11:32 ` Milian Wolff
2016-08-15 17:13 ` Andi Kleen
2016-08-16 15:13 ` Namhyung Kim
2016-08-16 15:45 ` Milian Wolff
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).