linux-perf-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* perf report --inline usability issues when sorting by function
@ 2017-04-27 21:09 Milian Wolff
  2017-04-27 22:03 ` Milian Wolff
  0 siblings, 1 reply; 2+ messages in thread
From: Milian Wolff @ 2017-04-27 21:09 UTC (permalink / raw)
  To: Perf Users; +Cc: Jin, Yao

[-- Attachment #1: Type: text/plain, Size: 4381 bytes --]

Hey Jin, others

I observe a usability issue with the following example code:

~~~~~~~~~~~~~~~~~~~~
#include <cmath>
#include <random>
#include <iostream>

using namespace std;

int main()
{
    uniform_real_distribution<double> uniform(-1E5, 1E5);
    default_random_engine engine;
    double s = 0;
    for (int i = 0; i < 10000000; ++i) {
        s += uniform(engine);
    }
    cout << "random sum: " << s << '\n';
    return 0;
}
~~~~~~~~~~~~~~~~~~~~

Compile it with `g++ -O2 -g` then record it with `perf record --call-graph 
dwarf` and inspect the data with inliners enabled. Here are some excerpts that 
show the issues:

~~~~~~~~~~~~~~~~~~~~
$ perf report --stdio --inline --no-children
Failed to open [ext4], continuing without symbols
# To display the perf.data header info, please use --header/--header-only 
options.
#
#
# Total Lost Samples: 0
#
# Samples: 499  of event 'cycles'
# Event count (approx.): 329354953
#
# Overhead  Command    Shared Object      Symbol                          
# ........  .........  .................  ................................
#
    96.70%  ex_random  ex_random          [.] main
            |
            ---main
               __libc_start_main
               _start

     0.96%  ex_random  ld-2.25.so         [.] do_lookup_x
            |
            ---do_lookup_x
               _dl_lookup_symbol_x
               _dl_relocate_object
               dl_main
               _dl_sysdep_start
               _dl_start
               _dl_start_user

     0.30%  ex_random  ld-2.25.so         [.] _dl_lookup_symbol_x
...
~~~~~~~~~~~~~~~~~~~~~~

Note how no inlined frames are actually shown. I think the issue here is that 
when we group by function name (the default), then multiple samples with 
different IPs will get merged. Once we display the data then in e.g. ui/stdio/
hist.c, we will only have one node left for this function, with an arbitrary 
IP (of the first sample, I guess).

This is problematic, because depending on the IP, different inlined call 
stacks may have been used. This is observable for the above use-case by 
changing the grouping algorithm, e.g.:

~~~~~~~~~~~~~~~~~~~~~~~
$ perf report --stdio --inline --no-children -g srcline
Invalid callchain mode: srcline
Invalid callchain order: srcline
Failed to open [ext4], continuing without symbols
# To display the perf.data header info, please use --header/--header-only 
options.
#
#
# Total Lost Samples: 0
#
# Samples: 499  of event 'cycles'
# Event count (approx.): 329354953
#
# Overhead  Command    Shared Object      Symbol                          
# ........  .........  .................  ................................
#
    96.70%  ex_random  ex_random          [.] main
            |          
            |--28.97%--main random.tcc:3326
            |          /usr/include/c++/6.3.1/bits/random.h:185 (inline)
            |          /usr/include/c++/6.3.1/bits/random.h:1818 (inline)
            |          /usr/include/c++/6.3.1/bits/random.h:1809 (inline)
            |          /home/milian/projects/kdab/training-material/addon/
profiling/build/ex_random/../../ex_random/main.cpp:13 (inline)
            |          __libc_start_main
            |          _start
            |          
            |--28.14%--main random.h:143
            |          /usr/include/c++/6.3.1/bits/random.h:151 (inline)
            |          /usr/include/c++/6.3.1/bits/random.h:332 (inline)
            |          /usr/include/c++/6.3.1/bits/random.tcc:3332 (inline)
            |          /usr/include/c++/6.3.1/bits/random.h:185 (inline)
            |          /usr/include/c++/6.3.1/bits/random.h:1818 (inline)
            |          /usr/include/c++/6.3.1/bits/random.h:1809 (inline)
            |          /home/milian/projects/kdab/training-material/addon/
profiling/build/ex_random/../../ex_random/main.cpp:13 (inline)
            |          __libc_start_main
            |          _start
            |
...
~~~~~~~~~~~~~~~~~~~~~~~

Has anyone any suggestion on how to fix this? Some ideas:

- adapt the function aggregation to also take the inlined function into 
account
- always add inlined frames to the graph
- keep all IPs and their cost even when aggregating functions

Cheers

-- 
Milian Wolff | milian.wolff@kdab.com | Software Engineer
KDAB (Deutschland) GmbH&Co KG, a KDAB Group company
Tel: +49-30-521325470
KDAB - The Qt Experts

[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5903 bytes --]

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: perf report --inline usability issues when sorting by function
  2017-04-27 21:09 perf report --inline usability issues when sorting by function Milian Wolff
@ 2017-04-27 22:03 ` Milian Wolff
  0 siblings, 0 replies; 2+ messages in thread
From: Milian Wolff @ 2017-04-27 22:03 UTC (permalink / raw)
  To: Milian Wolff; +Cc: Perf Users, Jin, Yao

[-- Attachment #1: Type: text/plain, Size: 4750 bytes --]

On Donnerstag, 27. April 2017 23:09:10 CEST Milian Wolff wrote:
> Hey Jin, others
> 
> I observe a usability issue with the following example code:
> 
> ~~~~~~~~~~~~~~~~~~~~
> #include <cmath>
> #include <random>
> #include <iostream>
> 
> using namespace std;
> 
> int main()
> {
>     uniform_real_distribution<double> uniform(-1E5, 1E5);
>     default_random_engine engine;
>     double s = 0;
>     for (int i = 0; i < 10000000; ++i) {
>         s += uniform(engine);
>     }
>     cout << "random sum: " << s << '\n';
>     return 0;
> }
> ~~~~~~~~~~~~~~~~~~~~
> 
> Compile it with `g++ -O2 -g` then record it with `perf record --call-graph
> dwarf` and inspect the data with inliners enabled. Here are some excerpts
> that show the issues:
> 
> ~~~~~~~~~~~~~~~~~~~~
> $ perf report --stdio --inline --no-children
> Failed to open [ext4], continuing without symbols
> # To display the perf.data header info, please use --header/--header-only
> options.
> #
> #
> # Total Lost Samples: 0
> #
> # Samples: 499  of event 'cycles'
> # Event count (approx.): 329354953
> #
> # Overhead  Command    Shared Object      Symbol
> # ........  .........  .................  ................................
> #
>     96.70%  ex_random  ex_random          [.] main
> 
>             ---main
>                __libc_start_main
>                _start
> 
>      0.96%  ex_random  ld-2.25.so         [.] do_lookup_x
> 
>             ---do_lookup_x
>                _dl_lookup_symbol_x
>                _dl_relocate_object
>                dl_main
>                _dl_sysdep_start
>                _dl_start
>                _dl_start_user
> 
>      0.30%  ex_random  ld-2.25.so         [.] _dl_lookup_symbol_x
> ...
> ~~~~~~~~~~~~~~~~~~~~~~
> 
> Note how no inlined frames are actually shown. I think the issue here is
> that when we group by function name (the default), then multiple samples
> with different IPs will get merged. Once we display the data then in e.g.
> ui/stdio/ hist.c, we will only have one node left for this function, with
> an arbitrary IP (of the first sample, I guess).
> 
> This is problematic, because depending on the IP, different inlined call
> stacks may have been used. This is observable for the above use-case by
> changing the grouping algorithm, e.g.:
> 
> ~~~~~~~~~~~~~~~~~~~~~~~
> $ perf report --stdio --inline --no-children -g srcline
> Invalid callchain mode: srcline
> Invalid callchain order: srcline
> Failed to open [ext4], continuing without symbols
> # To display the perf.data header info, please use --header/--header-only
> options.
> #
> #
> # Total Lost Samples: 0
> #
> # Samples: 499  of event 'cycles'
> # Event count (approx.): 329354953
> #
> # Overhead  Command    Shared Object      Symbol
> # ........  .........  .................  ................................
> #
>     96.70%  ex_random  ex_random          [.] main
> 
>             |--28.97%--main random.tcc:3326
>             |
>             |          /usr/include/c++/6.3.1/bits/random.h:185 (inline)
>             |          /usr/include/c++/6.3.1/bits/random.h:1818 (inline)
>             |          /usr/include/c++/6.3.1/bits/random.h:1809 (inline)
>             |          /home/milian/projects/kdab/training-material/addon/
> 
> profiling/build/ex_random/../../ex_random/main.cpp:13 (inline)
> 
>             |          __libc_start_main
>             |          _start
>             |
>             |--28.14%--main random.h:143
>             |
>             |          /usr/include/c++/6.3.1/bits/random.h:151 (inline)
>             |          /usr/include/c++/6.3.1/bits/random.h:332 (inline)
>             |          /usr/include/c++/6.3.1/bits/random.tcc:3332 (inline)
>             |          /usr/include/c++/6.3.1/bits/random.h:185 (inline)
>             |          /usr/include/c++/6.3.1/bits/random.h:1818 (inline)
>             |          /usr/include/c++/6.3.1/bits/random.h:1809 (inline)
>             |          /home/milian/projects/kdab/training-material/addon/
> 
> profiling/build/ex_random/../../ex_random/main.cpp:13 (inline)
> 
>             |          __libc_start_main
>             |          _start
> 
> ...
> ~~~~~~~~~~~~~~~~~~~~~~~
> 
> Has anyone any suggestion on how to fix this? Some ideas:
> 
> - adapt the function aggregation to also take the inlined function into
> account
> - always add inlined frames to the graph
> - keep all IPs and their cost even when aggregating functions

I have no implemented the first option from the above list and sent a patch 
for review. Feedback welcome! It seems to work quite nicely in my tests.

Thanks

-- 
Milian Wolff | milian.wolff@kdab.com | Software Engineer
KDAB (Deutschland) GmbH&Co KG, a KDAB Group company
Tel: +49-30-521325470
KDAB - The Qt Experts

[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5903 bytes --]

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2017-04-27 22:03 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-04-27 21:09 perf report --inline usability issues when sorting by function Milian Wolff
2017-04-27 22:03 ` Milian Wolff

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).