* perf report --inline usability issues when sorting by function
@ 2017-04-27 21:09 Milian Wolff
2017-04-27 22:03 ` Milian Wolff
0 siblings, 1 reply; 2+ messages in thread
From: Milian Wolff @ 2017-04-27 21:09 UTC (permalink / raw)
To: Perf Users; +Cc: Jin, Yao
[-- Attachment #1: Type: text/plain, Size: 4381 bytes --]
Hey Jin, others
I observe a usability issue with the following example code:
~~~~~~~~~~~~~~~~~~~~
#include <cmath>
#include <random>
#include <iostream>
using namespace std;
int main()
{
uniform_real_distribution<double> uniform(-1E5, 1E5);
default_random_engine engine;
double s = 0;
for (int i = 0; i < 10000000; ++i) {
s += uniform(engine);
}
cout << "random sum: " << s << '\n';
return 0;
}
~~~~~~~~~~~~~~~~~~~~
Compile it with `g++ -O2 -g` then record it with `perf record --call-graph
dwarf` and inspect the data with inliners enabled. Here are some excerpts that
show the issues:
~~~~~~~~~~~~~~~~~~~~
$ perf report --stdio --inline --no-children
Failed to open [ext4], continuing without symbols
# To display the perf.data header info, please use --header/--header-only
options.
#
#
# Total Lost Samples: 0
#
# Samples: 499 of event 'cycles'
# Event count (approx.): 329354953
#
# Overhead Command Shared Object Symbol
# ........ ......... ................. ................................
#
96.70% ex_random ex_random [.] main
|
---main
__libc_start_main
_start
0.96% ex_random ld-2.25.so [.] do_lookup_x
|
---do_lookup_x
_dl_lookup_symbol_x
_dl_relocate_object
dl_main
_dl_sysdep_start
_dl_start
_dl_start_user
0.30% ex_random ld-2.25.so [.] _dl_lookup_symbol_x
...
~~~~~~~~~~~~~~~~~~~~~~
Note how no inlined frames are actually shown. I think the issue here is that
when we group by function name (the default), then multiple samples with
different IPs will get merged. Once we display the data then in e.g. ui/stdio/
hist.c, we will only have one node left for this function, with an arbitrary
IP (of the first sample, I guess).
This is problematic, because depending on the IP, different inlined call
stacks may have been used. This is observable for the above use-case by
changing the grouping algorithm, e.g.:
~~~~~~~~~~~~~~~~~~~~~~~
$ perf report --stdio --inline --no-children -g srcline
Invalid callchain mode: srcline
Invalid callchain order: srcline
Failed to open [ext4], continuing without symbols
# To display the perf.data header info, please use --header/--header-only
options.
#
#
# Total Lost Samples: 0
#
# Samples: 499 of event 'cycles'
# Event count (approx.): 329354953
#
# Overhead Command Shared Object Symbol
# ........ ......... ................. ................................
#
96.70% ex_random ex_random [.] main
|
|--28.97%--main random.tcc:3326
| /usr/include/c++/6.3.1/bits/random.h:185 (inline)
| /usr/include/c++/6.3.1/bits/random.h:1818 (inline)
| /usr/include/c++/6.3.1/bits/random.h:1809 (inline)
| /home/milian/projects/kdab/training-material/addon/
profiling/build/ex_random/../../ex_random/main.cpp:13 (inline)
| __libc_start_main
| _start
|
|--28.14%--main random.h:143
| /usr/include/c++/6.3.1/bits/random.h:151 (inline)
| /usr/include/c++/6.3.1/bits/random.h:332 (inline)
| /usr/include/c++/6.3.1/bits/random.tcc:3332 (inline)
| /usr/include/c++/6.3.1/bits/random.h:185 (inline)
| /usr/include/c++/6.3.1/bits/random.h:1818 (inline)
| /usr/include/c++/6.3.1/bits/random.h:1809 (inline)
| /home/milian/projects/kdab/training-material/addon/
profiling/build/ex_random/../../ex_random/main.cpp:13 (inline)
| __libc_start_main
| _start
|
...
~~~~~~~~~~~~~~~~~~~~~~~
Has anyone any suggestion on how to fix this? Some ideas:
- adapt the function aggregation to also take the inlined function into
account
- always add inlined frames to the graph
- keep all IPs and their cost even when aggregating functions
Cheers
--
Milian Wolff | milian.wolff@kdab.com | Software Engineer
KDAB (Deutschland) GmbH&Co KG, a KDAB Group company
Tel: +49-30-521325470
KDAB - The Qt Experts
[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5903 bytes --]
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: perf report --inline usability issues when sorting by function
2017-04-27 21:09 perf report --inline usability issues when sorting by function Milian Wolff
@ 2017-04-27 22:03 ` Milian Wolff
0 siblings, 0 replies; 2+ messages in thread
From: Milian Wolff @ 2017-04-27 22:03 UTC (permalink / raw)
To: Milian Wolff; +Cc: Perf Users, Jin, Yao
[-- Attachment #1: Type: text/plain, Size: 4750 bytes --]
On Donnerstag, 27. April 2017 23:09:10 CEST Milian Wolff wrote:
> Hey Jin, others
>
> I observe a usability issue with the following example code:
>
> ~~~~~~~~~~~~~~~~~~~~
> #include <cmath>
> #include <random>
> #include <iostream>
>
> using namespace std;
>
> int main()
> {
> uniform_real_distribution<double> uniform(-1E5, 1E5);
> default_random_engine engine;
> double s = 0;
> for (int i = 0; i < 10000000; ++i) {
> s += uniform(engine);
> }
> cout << "random sum: " << s << '\n';
> return 0;
> }
> ~~~~~~~~~~~~~~~~~~~~
>
> Compile it with `g++ -O2 -g` then record it with `perf record --call-graph
> dwarf` and inspect the data with inliners enabled. Here are some excerpts
> that show the issues:
>
> ~~~~~~~~~~~~~~~~~~~~
> $ perf report --stdio --inline --no-children
> Failed to open [ext4], continuing without symbols
> # To display the perf.data header info, please use --header/--header-only
> options.
> #
> #
> # Total Lost Samples: 0
> #
> # Samples: 499 of event 'cycles'
> # Event count (approx.): 329354953
> #
> # Overhead Command Shared Object Symbol
> # ........ ......... ................. ................................
> #
> 96.70% ex_random ex_random [.] main
>
> ---main
> __libc_start_main
> _start
>
> 0.96% ex_random ld-2.25.so [.] do_lookup_x
>
> ---do_lookup_x
> _dl_lookup_symbol_x
> _dl_relocate_object
> dl_main
> _dl_sysdep_start
> _dl_start
> _dl_start_user
>
> 0.30% ex_random ld-2.25.so [.] _dl_lookup_symbol_x
> ...
> ~~~~~~~~~~~~~~~~~~~~~~
>
> Note how no inlined frames are actually shown. I think the issue here is
> that when we group by function name (the default), then multiple samples
> with different IPs will get merged. Once we display the data then in e.g.
> ui/stdio/ hist.c, we will only have one node left for this function, with
> an arbitrary IP (of the first sample, I guess).
>
> This is problematic, because depending on the IP, different inlined call
> stacks may have been used. This is observable for the above use-case by
> changing the grouping algorithm, e.g.:
>
> ~~~~~~~~~~~~~~~~~~~~~~~
> $ perf report --stdio --inline --no-children -g srcline
> Invalid callchain mode: srcline
> Invalid callchain order: srcline
> Failed to open [ext4], continuing without symbols
> # To display the perf.data header info, please use --header/--header-only
> options.
> #
> #
> # Total Lost Samples: 0
> #
> # Samples: 499 of event 'cycles'
> # Event count (approx.): 329354953
> #
> # Overhead Command Shared Object Symbol
> # ........ ......... ................. ................................
> #
> 96.70% ex_random ex_random [.] main
>
> |--28.97%--main random.tcc:3326
> |
> | /usr/include/c++/6.3.1/bits/random.h:185 (inline)
> | /usr/include/c++/6.3.1/bits/random.h:1818 (inline)
> | /usr/include/c++/6.3.1/bits/random.h:1809 (inline)
> | /home/milian/projects/kdab/training-material/addon/
>
> profiling/build/ex_random/../../ex_random/main.cpp:13 (inline)
>
> | __libc_start_main
> | _start
> |
> |--28.14%--main random.h:143
> |
> | /usr/include/c++/6.3.1/bits/random.h:151 (inline)
> | /usr/include/c++/6.3.1/bits/random.h:332 (inline)
> | /usr/include/c++/6.3.1/bits/random.tcc:3332 (inline)
> | /usr/include/c++/6.3.1/bits/random.h:185 (inline)
> | /usr/include/c++/6.3.1/bits/random.h:1818 (inline)
> | /usr/include/c++/6.3.1/bits/random.h:1809 (inline)
> | /home/milian/projects/kdab/training-material/addon/
>
> profiling/build/ex_random/../../ex_random/main.cpp:13 (inline)
>
> | __libc_start_main
> | _start
>
> ...
> ~~~~~~~~~~~~~~~~~~~~~~~
>
> Has anyone any suggestion on how to fix this? Some ideas:
>
> - adapt the function aggregation to also take the inlined function into
> account
> - always add inlined frames to the graph
> - keep all IPs and their cost even when aggregating functions
I have no implemented the first option from the above list and sent a patch
for review. Feedback welcome! It seems to work quite nicely in my tests.
Thanks
--
Milian Wolff | milian.wolff@kdab.com | Software Engineer
KDAB (Deutschland) GmbH&Co KG, a KDAB Group company
Tel: +49-30-521325470
KDAB - The Qt Experts
[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5903 bytes --]
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2017-04-27 22:03 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-04-27 21:09 perf report --inline usability issues when sorting by function Milian Wolff
2017-04-27 22:03 ` Milian Wolff
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).