* Feedback on: [RESEND PATCH v2 0/5] perf report: Show inline stack
@ 2017-01-07 20:13 Milian Wolff
2017-01-09 3:07 ` Jin, Yao
0 siblings, 1 reply; 9+ messages in thread
From: Milian Wolff @ 2017-01-07 20:13 UTC (permalink / raw)
To: Jin, Yao; +Cc: linux-perf-users
Hey Jin,
I have finally had the time to test your patch series. Sadly, I hit a
fundamental issue with it early on in my testing:
In my test case, I get with `perf report --no-children` e.g. the following:
42.50% lab_mandelbrot libm-2.24.so [.] __hypot_finite
|
---__hypot_finite
hypot
main
__libc_start_main
_start
Enabling --inline does not change the output, even though hypot is inlined
into main.
Looking at your code, it looks to me as if you are only trying to find
inliners for root entries, cf. hist_entry__fprintf (stdio/hist.c), and only
when unwinding failed (?, cf. (callchain_ret == 0)).
Is this understanding correct, or should your patch also add inliners for
nodes in the middle of the callstack?
This approach makes the whole feature unusable for my use case. In my (really
hacky) approach to adding inliners to the output of perf (https://github.com/
milianw/linux/commit/71d031c9d679bfb4a4044226e8903dd80ea601b3) I decided to
lookup inliners for _every_ callchain entry. The result then there is with my
branch and a `perf report -g srcline --stdio`:
42.50% lab_mandelbrot libm-2.24.so [.] __hypot_finite
|
---__hypot_finite +387
hypot +20
std::__complex_abs complex:589
std::abs<double> complex:597
drawMandelbrot mandelbrot.h:39
main main.cpp:55
__libc_start_main +241
_start +42
Note the three inliner entries in the middle of the callstack. The way I did
it, one will also see "proper" entries for inlined functions in a top-down
`perf report --children` output.
In general, your patch series looks much more polished than mine, so I would
love to get it integrated. But with your approach, we will need lots of
changes to the various browsers to get all features in. My hacky approach has
the nice property that it simply adds "normal" callchain entries for inlined
nodes - maybe you should do the same?
Thanks a lot for your work, much appreciated.
--
Milian Wolff | milian.wolff@kdab.com | Software Engineer
KDAB (Deutschland) GmbH&Co KG, a KDAB Group company
Tel: +49-30-521325470
KDAB - The Qt Experts
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Feedback on: [RESEND PATCH v2 0/5] perf report: Show inline stack
2017-01-07 20:13 Feedback on: [RESEND PATCH v2 0/5] perf report: Show inline stack Milian Wolff
@ 2017-01-09 3:07 ` Jin, Yao
2017-01-09 8:02 ` Milian Wolff
0 siblings, 1 reply; 9+ messages in thread
From: Jin, Yao @ 2017-01-09 3:07 UTC (permalink / raw)
To: Milian Wolff; +Cc: linux-perf-users
Hi Wolff,
Thanks a lot for your feedback.
Have you tested with the browser mode? I mean without the "--stdio" option.
Could you give your test binary then I can try to reproduce the issue? I
have tried with my tests but I could not reproduce.
Thanks
Jin Yao
On 1/8/2017 4:13 AM, Milian Wolff wrote:
> Hey Jin,
>
> I have finally had the time to test your patch series. Sadly, I hit a
> fundamental issue with it early on in my testing:
>
> In my test case, I get with `perf report --no-children` e.g. the following:
>
> 42.50% lab_mandelbrot libm-2.24.so [.] __hypot_finite
> |
> ---__hypot_finite
> hypot
> main
> __libc_start_main
> _start
>
> Enabling --inline does not change the output, even though hypot is inlined
> into main.
>
> Looking at your code, it looks to me as if you are only trying to find
> inliners for root entries, cf. hist_entry__fprintf (stdio/hist.c), and only
> when unwinding failed (?, cf. (callchain_ret == 0)).
>
> Is this understanding correct, or should your patch also add inliners for
> nodes in the middle of the callstack?
>
> This approach makes the whole feature unusable for my use case. In my (really
> hacky) approach to adding inliners to the output of perf (https://github.com/
> milianw/linux/commit/71d031c9d679bfb4a4044226e8903dd80ea601b3) I decided to
> lookup inliners for _every_ callchain entry. The result then there is with my
> branch and a `perf report -g srcline --stdio`:
>
> 42.50% lab_mandelbrot libm-2.24.so [.] __hypot_finite
> |
> ---__hypot_finite +387
> hypot +20
> std::__complex_abs complex:589
> std::abs<double> complex:597
> drawMandelbrot mandelbrot.h:39
> main main.cpp:55
> __libc_start_main +241
> _start +42
>
> Note the three inliner entries in the middle of the callstack. The way I did
> it, one will also see "proper" entries for inlined functions in a top-down
> `perf report --children` output.
>
> In general, your patch series looks much more polished than mine, so I would
> love to get it integrated. But with your approach, we will need lots of
> changes to the various browsers to get all features in. My hacky approach has
> the nice property that it simply adds "normal" callchain entries for inlined
> nodes - maybe you should do the same?
>
> Thanks a lot for your work, much appreciated.
>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Feedback on: [RESEND PATCH v2 0/5] perf report: Show inline stack
2017-01-09 3:07 ` Jin, Yao
@ 2017-01-09 8:02 ` Milian Wolff
2017-01-09 8:17 ` Jin, Yao
0 siblings, 1 reply; 9+ messages in thread
From: Milian Wolff @ 2017-01-09 8:02 UTC (permalink / raw)
To: Jin, Yao; +Cc: linux-perf-users
[-- Attachment #1: Type: text/plain, Size: 6725 bytes --]
On Monday, January 9, 2017 11:07:26 AM CET Jin, Yao wrote:
> Hi Wolff,
>
> Thanks a lot for your feedback.
>
> Have you tested with the browser mode? I mean without the "--stdio" option.
Yes, I have done that:
~~~~~~~~~~~~
milian@milian-kdab2:/tmp$ g++ -O2 -g inline.cpp -o inline
milian@milian-kdab2:/tmp$ perf record --call-graph dwarf ./inline
6.66016e+15
[ perf record: Woken up 5 times to write data ]
[ perf record: Captured and wrote 1.215 MB perf.data (149 samples) ]
~~~~~~~~~~~~
Both with and without your patch (and passing --inline) I get:
~~~~~~~~~~~~
milian@milian-kdab2:/tmp$ perf report --no-children --inline
Samples: 149 of event 'cycles:ppp', Event count (approx.): 141334499
Overhead Command Shared Object Symbol
- 62.24% inline inline [.] main
main
__libc_start_main
_start
- 26.24% inline libm-2.24.so [.] __hypot_finite
- __hypot_finite
- 25.56% hypot
main
__libc_start_main
_start
0.68% 0x40ead238ed1a2cf4
- 6.85% inline libm-2.24.so [.] hypot
hypot
main
__libc_start_main
_start
~~~~~~~~~~~~
> Could you give your test binary then I can try to reproduce the issue? I
> have tried with my tests but I could not reproduce.
The following code should reproduce the issue:
~~~~~~~~~~~~
#include <complex>
#include <cmath>
#include <random>
#include <iostream>
using namespace std;
int main()
{
uniform_real_distribution<double> uniform(-1E5, 1E5);
default_random_engine engine;
double s = 0;
for (int i = 0; i < 1000000; ++i) {
s += norm(complex<double>(uniform(engine), uniform(engine)));
}
cout << s << '\n';
return 0;
}
~~~~~~~~~~~~
Hope this helps
> On 1/8/2017 4:13 AM, Milian Wolff wrote:
> > Hey Jin,
> >
> > I have finally had the time to test your patch series. Sadly, I hit a
> > fundamental issue with it early on in my testing:
> >
> > In my test case, I get with `perf report --no-children` e.g. the
following:
> > 42.50% lab_mandelbrot libm-2.24.so [.]
> > __hypot_finite
> >
> > ---__hypot_finite
> >
> > hypot
> > main
> > __libc_start_main
> > _start
> >
> > Enabling --inline does not change the output, even though hypot is inlined
> > into main.
> >
> > Looking at your code, it looks to me as if you are only trying to find
> > inliners for root entries, cf. hist_entry__fprintf (stdio/hist.c), and
> > only
> > when unwinding failed (?, cf. (callchain_ret == 0)).
> >
> > Is this understanding correct, or should your patch also add inliners for
> > nodes in the middle of the callstack?
> >
> > This approach makes the whole feature unusable for my use case. In my
> > (really hacky) approach to adding inliners to the output of perf
> > (https://github.com/
> > milianw/linux/commit/71d031c9d679bfb4a4044226e8903dd80ea601b3) I decided
> > to lookup inliners for _every_ callchain entry. The result then there is
> > with my>
> > branch and a `perf report -g srcline --stdio`:
> > 42.50% lab_mandelbrot libm-2.24.so [.]
> > __hypot_finite
> >
> > ---__hypot_finite +387
> >
> > hypot +20
> > std::__complex_abs complex:589
> > std::abs<double> complex:597
> > drawMandelbrot mandelbrot.h:39
> > main main.cpp:55
> > __libc_start_main +241
> > _start +42
> >
> > Note the three inliner entries in the middle of the callstack. The way I
> > did it, one will also see "proper" entries for inlined functions in a
> > top-down `perf report --children` output.
> >
> > In general, your patch series looks much more polished than mine, so I
> > would love to get it integrated. But with your approach, we will need
> > lots of changes to the various browsers to get all features in. My hacky
> > approach has the nice property that it simply adds "normal" callchain
> > entries for inlined nodes - maybe you should do the same?
> >
> > Thanks a lot for your work, much appreciated.
--
Milian Wolff | milian.wolff@kdab.com | Software Engineer
KDAB (Deutschland) GmbH&Co KG, a KDAB Group company
Tel: +49-30-521325470
KDAB - The Qt Experts
[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5903 bytes --]
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Feedback on: [RESEND PATCH v2 0/5] perf report: Show inline stack
2017-01-09 8:02 ` Milian Wolff
@ 2017-01-09 8:17 ` Jin, Yao
2017-01-09 8:41 ` Milian Wolff
0 siblings, 1 reply; 9+ messages in thread
From: Jin, Yao @ 2017-01-09 8:17 UTC (permalink / raw)
To: Milian Wolff; +Cc: linux-perf-users
Hi Wolff,
Thanks a lot for your test code!
I have updated my patch. The new output for your test code (binary name
is inline) is:
# Overhead Command Shared Object Symbol
# ........ ....... ................... ..................................
#
56.76% inline inline [.] main
|
---main
__libc_start_main
_start
40.54% inline libm-2.23.so [.] __hypot_finite
|
---__hypot_finite
|
--40.09%--__hypot
main
__libc_start_main
_start
1.30% inline libm-2.23.so [.] __hypot
|
---__hypot
main
|
---/usr/include/c++/5/bits/random.h:151 (inline)
/usr/include/c++/5/bits/random.h:332 (inline)
/usr/include/c++/5/bits/random.tcc:3328 (inline)
/usr/include/c++/5/bits/random.h:185 (inline)
/usr/include/c++/5/bits/random.h:1818 (inline)
/usr/include/c++/5/bits/random.h:1809 (inline)
/home/jinyao/skl-ws/perf-dev/lck-2867/test/inline.cpp:14 (inline)
__libc_start_main
_start
The inline entry is tagged with "(inline)". Is this result within
expectation?
Thanks
Jin Yao
On 1/9/2017 4:02 PM, Milian Wolff wrote:
> On Monday, January 9, 2017 11:07:26 AM CET Jin, Yao wrote:
>> Hi Wolff,
>>
>> Thanks a lot for your feedback.
>>
>> Have you tested with the browser mode? I mean without the "--stdio" option.
> Yes, I have done that:
>
> ~~~~~~~~~~~~
> milian@milian-kdab2:/tmp$ g++ -O2 -g inline.cpp -o inline
> milian@milian-kdab2:/tmp$ perf record --call-graph dwarf ./inline
> 6.66016e+15
> [ perf record: Woken up 5 times to write data ]
> [ perf record: Captured and wrote 1.215 MB perf.data (149 samples) ]
> ~~~~~~~~~~~~
>
> Both with and without your patch (and passing --inline) I get:
>
> ~~~~~~~~~~~~
> milian@milian-kdab2:/tmp$ perf report --no-children --inline
> Samples: 149 of event 'cycles:ppp', Event count (approx.): 141334499
> Overhead Command Shared Object Symbol
> - 62.24% inline inline [.] main
> main
> __libc_start_main
> _start
> - 26.24% inline libm-2.24.so [.] __hypot_finite
> - __hypot_finite
> - 25.56% hypot
> main
> __libc_start_main
> _start
> 0.68% 0x40ead238ed1a2cf4
> - 6.85% inline libm-2.24.so [.] hypot
> hypot
> main
> __libc_start_main
> _start
> ~~~~~~~~~~~~
>
>> Could you give your test binary then I can try to reproduce the issue? I
>> have tried with my tests but I could not reproduce.
> The following code should reproduce the issue:
>
> ~~~~~~~~~~~~
> #include <complex>
> #include <cmath>
> #include <random>
> #include <iostream>
>
> using namespace std;
>
> int main()
> {
> uniform_real_distribution<double> uniform(-1E5, 1E5);
> default_random_engine engine;
> double s = 0;
> for (int i = 0; i < 1000000; ++i) {
> s += norm(complex<double>(uniform(engine), uniform(engine)));
> }
> cout << s << '\n';
> return 0;
> }
> ~~~~~~~~~~~~
>
> Hope this helps
>
>> On 1/8/2017 4:13 AM, Milian Wolff wrote:
>>> Hey Jin,
>>>
>>> I have finally had the time to test your patch series. Sadly, I hit a
>>> fundamental issue with it early on in my testing:
>>>
>>> In my test case, I get with `perf report --no-children` e.g. the
> following:
>>> 42.50% lab_mandelbrot libm-2.24.so [.]
>>> __hypot_finite
>>>
>>> ---__hypot_finite
>>>
>>> hypot
>>> main
>>> __libc_start_main
>>> _start
>>>
>>> Enabling --inline does not change the output, even though hypot is inlined
>>> into main.
>>>
>>> Looking at your code, it looks to me as if you are only trying to find
>>> inliners for root entries, cf. hist_entry__fprintf (stdio/hist.c), and
>>> only
>>> when unwinding failed (?, cf. (callchain_ret == 0)).
>>>
>>> Is this understanding correct, or should your patch also add inliners for
>>> nodes in the middle of the callstack?
>>>
>>> This approach makes the whole feature unusable for my use case. In my
>>> (really hacky) approach to adding inliners to the output of perf
>>> (https://github.com/
>>> milianw/linux/commit/71d031c9d679bfb4a4044226e8903dd80ea601b3) I decided
>>> to lookup inliners for _every_ callchain entry. The result then there is
>>> with my>
>>> branch and a `perf report -g srcline --stdio`:
>>> 42.50% lab_mandelbrot libm-2.24.so [.]
>>> __hypot_finite
>>>
>>> ---__hypot_finite +387
>>>
>>> hypot +20
>>> std::__complex_abs complex:589
>>> std::abs<double> complex:597
>>> drawMandelbrot mandelbrot.h:39
>>> main main.cpp:55
>>> __libc_start_main +241
>>> _start +42
>>>
>>> Note the three inliner entries in the middle of the callstack. The way I
>>> did it, one will also see "proper" entries for inlined functions in a
>>> top-down `perf report --children` output.
>>>
>>> In general, your patch series looks much more polished than mine, so I
>>> would love to get it integrated. But with your approach, we will need
>>> lots of changes to the various browsers to get all features in. My hacky
>>> approach has the nice property that it simply adds "normal" callchain
>>> entries for inlined nodes - maybe you should do the same?
>>>
>>> Thanks a lot for your work, much appreciated.
>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Feedback on: [RESEND PATCH v2 0/5] perf report: Show inline stack
2017-01-09 8:17 ` Jin, Yao
@ 2017-01-09 8:41 ` Milian Wolff
2017-01-10 1:53 ` Jin, Yao
0 siblings, 1 reply; 9+ messages in thread
From: Milian Wolff @ 2017-01-09 8:41 UTC (permalink / raw)
To: Jin, Yao; +Cc: linux-perf-users
[-- Attachment #1: Type: text/plain, Size: 4980 bytes --]
On Monday, January 9, 2017 4:17:48 PM CET Jin, Yao wrote:
> Hi Wolff,
>
> Thanks a lot for your test code!
>
> I have updated my patch. The new output for your test code (binary name
> is inline) is:
>
> # Overhead Command Shared Object Symbol
> # ........ ....... ................... ..................................
> #
> 56.76% inline inline [.] main
>
> ---main
> __libc_start_main
> _start
>
> 40.54% inline libm-2.23.so [.] __hypot_finite
>
> ---__hypot_finite
>
> --40.09%--__hypot
> main
> __libc_start_main
> _start
>
> 1.30% inline libm-2.23.so [.] __hypot
>
> ---__hypot
> main
>
> ---/usr/include/c++/5/bits/random.h:151 (inline)
> /usr/include/c++/5/bits/random.h:332 (inline)
> /usr/include/c++/5/bits/random.tcc:3328 (inline)
> /usr/include/c++/5/bits/random.h:185 (inline)
> /usr/include/c++/5/bits/random.h:1818 (inline)
> /usr/include/c++/5/bits/random.h:1809 (inline)
> /home/jinyao/skl-ws/perf-dev/lck-2867/test/inline.cpp:14 (inline)
> __libc_start_main
> _start
>
> The inline entry is tagged with "(inline)". Is this result within
> expectation?
No, this is not the expected output. I expect to get information about the
inlined frames in the middle of the call stack, in this example that happens
for the first two major hotspots.
Below is the output from my approach to this problem using my -g srcline:
~~~~~~~~~~~~~~ pasted here for readability: https://paste.kde.org/pmq3bc1le
Samples: 149 of event 'cycles:ppp', Event count (approx.): 141334499
Overhead Command Shared Object Symbol
- 62.24% inline inline [.] main
- 27.98% std::generate_canonical<double, 53ul,
std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >
random.tcc:3332
std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long,
16807ul, 0ul, 2147483647ul>, double>::operator() random.h:185
std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned
long, 16807ul, 0ul, 2147483647ul> > random.h:1818
std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned
long, 16807ul, 0ul, 2147483647ul> > random.h:1809
main ine.cpp:14
__libc_start_main +241
_start +4194346
- 25.88% std::__detail::_Mod<unsigned long, 2147483647ul, 16807ul, 0ul,
true, true>::__calc random.h:143
std::__detail::__mod<unsigned long, 2147483647ul, 16807ul, 0ul>
random.h:151
std::linear_congruential_engine<unsigned long, 16807ul, 0ul,
2147483647ul>::operator() random.h:332
std::generate_canonical<double, 53ul,
std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >
random.tcc:3332
std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long,
16807ul, 0ul, 2147483647ul>, double>::operator() random.h:185
std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned
long, 16807ul, 0ul, 2147483647ul> > random.h:1818
std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned
long, 16807ul, 0ul, 2147483647ul> > random.h:1809
main ine.cpp:14
__libc_start_main +241
_start +4194346
+ 4.14% main inline.cpp:13
+ 2.89%
std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned
long, 16807ul, 0ul, 2147483647ul> > random.h:1818
+ 0.68% std::_Norm_helper<true>::_S_do_it<double> complex:655
+ 0.67% std::generate_canonical<double, 53ul,
std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >
random.tcc:3326
- 26.24% inline libm-2.24.so [.] __hypot_finite
- __hypot_finite +179
- 25.56% hypot +20
std::__complex_abs complex:589
std::abs<double> complex:597
std::_Norm_helper<true>::_S_do_it<double> complex:654
std::norm<double> complex:664
main ine.cpp:14
__libc_start_main +241
_start +4194346
0.68% 0x40ead238ed1a2cf4
- 6.85% inline libm-2.24.so [.] hypot
hypot +40
std::__complex_abs complex:589
std::abs<double> complex:597
std::_Norm_helper<true>::_S_do_it<double> complex:654
std::norm<double> complex:664
main ine.cpp:14
__libc_start_main +241
_start +4194346
~~~~~~~~~~~~~~
I hope you see the value of such a report versus what you are proposing.
Thanks
--
Milian Wolff | milian.wolff@kdab.com | Software Engineer
KDAB (Deutschland) GmbH&Co KG, a KDAB Group company
Tel: +49-30-521325470
KDAB - The Qt Experts
[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5903 bytes --]
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Feedback on: [RESEND PATCH v2 0/5] perf report: Show inline stack
2017-01-09 8:41 ` Milian Wolff
@ 2017-01-10 1:53 ` Jin, Yao
2017-01-10 15:03 ` Milian Wolff
0 siblings, 1 reply; 9+ messages in thread
From: Jin, Yao @ 2017-01-10 1:53 UTC (permalink / raw)
To: Milian Wolff; +Cc: linux-perf-users
Hi Wolff,
I update the patch series with adding inline searching for each
callchain entry.
Now the result tested on my skl machine is:
# Overhead Command Shared Object Symbol
# ........ ....... ................... ..................................
#
56.76% inline inline [.] main
|
---main
__libc_start_main
_start
40.54% inline libm-2.23.so [.] __hypot_finite
|
---__hypot_finite
|
--40.09%--__hypot
|
---/usr/include/c++/5/bits/random.h:151 (inline)
/usr/include/c++/5/bits/random.h:332 (inline)
/usr/include/c++/5/bits/random.tcc:3328 (inline)
/usr/include/c++/5/bits/random.h:185 (inline)
/usr/include/c++/5/bits/random.h:1818 (inline)
/usr/include/c++/5/bits/random.h:1809 (inline)
/home/jinyao/skl-ws/perf-dev/lck-2867/test/inline.cpp:14 (inline)
main
__libc_start_main
_start
1.30% inline libm-2.23.so [.] __hypot
|
---__hypot
|
---/usr/include/c++/5/bits/random.h:151 (inline)
/usr/include/c++/5/bits/random.h:332 (inline)
/usr/include/c++/5/bits/random.tcc:3328 (inline)
/usr/include/c++/5/bits/random.h:185 (inline)
/usr/include/c++/5/bits/random.h:1818 (inline)
/usr/include/c++/5/bits/random.h:1809 (inline)
/home/jinyao/skl-ws/perf-dev/lck-2867/test/inline.cpp:14 (inline)
main
__libc_start_main
_start
Thanks
Jin Yao
On 1/9/2017 4:41 PM, Milian Wolff wrote:
> On Monday, January 9, 2017 4:17:48 PM CET Jin, Yao wrote:
>> Hi Wolff,
>>
>> Thanks a lot for your test code!
>>
>> I have updated my patch. The new output for your test code (binary name
>> is inline) is:
>>
>> # Overhead Command Shared Object Symbol
>> # ........ ....... ................... ..................................
>> #
>> 56.76% inline inline [.] main
>>
>> ---main
>> __libc_start_main
>> _start
>>
>> 40.54% inline libm-2.23.so [.] __hypot_finite
>>
>> ---__hypot_finite
>>
>> --40.09%--__hypot
>> main
>> __libc_start_main
>> _start
>>
>> 1.30% inline libm-2.23.so [.] __hypot
>>
>> ---__hypot
>> main
>>
>> ---/usr/include/c++/5/bits/random.h:151 (inline)
>> /usr/include/c++/5/bits/random.h:332 (inline)
>> /usr/include/c++/5/bits/random.tcc:3328 (inline)
>> /usr/include/c++/5/bits/random.h:185 (inline)
>> /usr/include/c++/5/bits/random.h:1818 (inline)
>> /usr/include/c++/5/bits/random.h:1809 (inline)
>> /home/jinyao/skl-ws/perf-dev/lck-2867/test/inline.cpp:14 (inline)
>> __libc_start_main
>> _start
>>
>> The inline entry is tagged with "(inline)". Is this result within
>> expectation?
> No, this is not the expected output. I expect to get information about the
> inlined frames in the middle of the call stack, in this example that happens
> for the first two major hotspots.
>
> Below is the output from my approach to this problem using my -g srcline:
>
> ~~~~~~~~~~~~~~ pasted here for readability: https://paste.kde.org/pmq3bc1le
> Samples: 149 of event 'cycles:ppp', Event count (approx.): 141334499
> Overhead Command Shared Object Symbol
> - 62.24% inline inline [.] main
> - 27.98% std::generate_canonical<double, 53ul,
> std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >
> random.tcc:3332
> std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long,
> 16807ul, 0ul, 2147483647ul>, double>::operator() random.h:185
> std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned
> long, 16807ul, 0ul, 2147483647ul> > random.h:1818
> std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned
> long, 16807ul, 0ul, 2147483647ul> > random.h:1809
> main ine.cpp:14
> __libc_start_main +241
> _start +4194346
> - 25.88% std::__detail::_Mod<unsigned long, 2147483647ul, 16807ul, 0ul,
> true, true>::__calc random.h:143
> std::__detail::__mod<unsigned long, 2147483647ul, 16807ul, 0ul>
> random.h:151
> std::linear_congruential_engine<unsigned long, 16807ul, 0ul,
> 2147483647ul>::operator() random.h:332
> std::generate_canonical<double, 53ul,
> std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >
> random.tcc:3332
> std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long,
> 16807ul, 0ul, 2147483647ul>, double>::operator() random.h:185
> std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned
> long, 16807ul, 0ul, 2147483647ul> > random.h:1818
> std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned
> long, 16807ul, 0ul, 2147483647ul> > random.h:1809
> main ine.cpp:14
> __libc_start_main +241
> _start +4194346
> + 4.14% main inline.cpp:13
> + 2.89%
> std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned
> long, 16807ul, 0ul, 2147483647ul> > random.h:1818
> + 0.68% std::_Norm_helper<true>::_S_do_it<double> complex:655
> + 0.67% std::generate_canonical<double, 53ul,
> std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >
> random.tcc:3326
> - 26.24% inline libm-2.24.so [.] __hypot_finite
> - __hypot_finite +179
> - 25.56% hypot +20
> std::__complex_abs complex:589
> std::abs<double> complex:597
> std::_Norm_helper<true>::_S_do_it<double> complex:654
> std::norm<double> complex:664
> main ine.cpp:14
> __libc_start_main +241
> _start +4194346
> 0.68% 0x40ead238ed1a2cf4
> - 6.85% inline libm-2.24.so [.] hypot
> hypot +40
> std::__complex_abs complex:589
> std::abs<double> complex:597
> std::_Norm_helper<true>::_S_do_it<double> complex:654
> std::norm<double> complex:664
> main ine.cpp:14
> __libc_start_main +241
> _start +4194346
> ~~~~~~~~~~~~~~
>
> I hope you see the value of such a report versus what you are proposing.
>
> Thanks
>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Feedback on: [RESEND PATCH v2 0/5] perf report: Show inline stack
2017-01-10 1:53 ` Jin, Yao
@ 2017-01-10 15:03 ` Milian Wolff
2017-01-11 1:01 ` Jin, Yao
0 siblings, 1 reply; 9+ messages in thread
From: Milian Wolff @ 2017-01-10 15:03 UTC (permalink / raw)
To: Jin, Yao; +Cc: linux-perf-users
[-- Attachment #1: Type: text/plain, Size: 8129 bytes --]
On Tuesday, January 10, 2017 9:53:04 AM CET Jin, Yao wrote:
> Hi Wolff,
>
> I update the patch series with adding inline searching for each
> callchain entry.
Hey Yao,
great! Is your git branch publicly accessible somewhere, so I can check it
out?
> Now the result tested on my skl machine is:
>
> # Overhead Command Shared Object Symbol
> # ........ ....... ................... ..................................
> #
> 56.76% inline inline [.] main
>
> ---main
> __libc_start_main
> _start
This is still wrong, no? Tons of stuff gets inlined into main and it seems as
if your code is still not capable of displaying what that is. Please refer to
the output I pasted from my patch in the previous mail. I'm explicitly looking
for the cost of using <random> e.g.
> 40.54% inline libm-2.23.so [.] __hypot_finite
>
> ---__hypot_finite
>
> --40.09%--__hypot
>
> ---/usr/include/c++/5/bits/random.h:151 (inline)
> /usr/include/c++/5/bits/random.h:332 (inline)
> /usr/include/c++/5/bits/random.tcc:3328 (inline)
> /usr/include/c++/5/bits/random.h:185 (inline)
> /usr/include/c++/5/bits/random.h:1818 (inline)
> /usr/include/c++/5/bits/random.h:1809 (inline)
> /home/jinyao/skl-ws/perf-dev/lck-2867/test/inline.cpp:14 (inline)
> main
> __libc_start_main
> _start
Much better, great work! But again I have to ask the same question:
Why not integrate directly with the other call stack nodes? This would solve
the following problems:
- correct display of children cost for inlined frames, i.e. in a `perf report
--children --inline`, I suspect your branch won't actually show any top-level
entries for inline.cpp:14, correct?
- display/aggregation: currently you are always displaying srcline in your
approach, what if I want to display/aggregate by function symbol instead?
So, could we please discuss this please?
Thanks a lot
> On 1/9/2017 4:41 PM, Milian Wolff wrote:
> > On Monday, January 9, 2017 4:17:48 PM CET Jin, Yao wrote:
> >> Hi Wolff,
> >>
> >> Thanks a lot for your test code!
> >>
> >> I have updated my patch. The new output for your test code (binary name
> >> is inline) is:
> >>
> >> # Overhead Command Shared Object Symbol
> >> # ........ ....... ...................
> >> ..................................
> >> #
> >>
> >> 56.76% inline inline [.] main
> >>
> >> ---main
> >>
> >> __libc_start_main
> >> _start
> >>
> >> 40.54% inline libm-2.23.so [.] __hypot_finite
> >>
> >> ---__hypot_finite
> >>
> >> --40.09%--__hypot
> >>
> >> main
> >> __libc_start_main
> >> _start
> >>
> >> 1.30% inline libm-2.23.so [.] __hypot
> >>
> >> ---__hypot
> >>
> >> main
> >>
> >> ---/usr/include/c++/5/bits/random.h:151 (inline)
> >>
> >> /usr/include/c++/5/bits/random.h:332 (inline)
> >> /usr/include/c++/5/bits/random.tcc:3328 (inline)
> >> /usr/include/c++/5/bits/random.h:185 (inline)
> >> /usr/include/c++/5/bits/random.h:1818 (inline)
> >> /usr/include/c++/5/bits/random.h:1809 (inline)
> >>
> >> /home/jinyao/skl-ws/perf-dev/lck-2867/test/inline.cpp:14 (inline)
> >>
> >> __libc_start_main
> >> _start
> >>
> >> The inline entry is tagged with "(inline)". Is this result within
> >> expectation?
> >
> > No, this is not the expected output. I expect to get information about the
> > inlined frames in the middle of the call stack, in this example that
> > happens for the first two major hotspots.
> >
> > Below is the output from my approach to this problem using my -g srcline:
> >
> > ~~~~~~~~~~~~~~ pasted here for readability:
> > https://paste.kde.org/pmq3bc1le
> > Samples: 149 of event 'cycles:ppp', Event count (approx.): 141334499
> >
> > Overhead Command Shared Object Symbol
> >
> > - 62.24% inline inline [.] main
> >
> > - 27.98% std::generate_canonical<double, 53ul,
> >
> > std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>
> > > random.tcc:3332
> >
> > std::__detail::_Adaptor<std::linear_congruential_engine<unsigned
> > long,
> >
> > 16807ul, 0ul, 2147483647ul>, double>::operator() random.h:185
> >
> > std::uniform_real_distribution<double>::operator()<std::linear_co
> > ngruential_engine<unsigned>
> > long, 16807ul, 0ul, 2147483647ul> > random.h:1818
> >
> > std::uniform_real_distribution<double>::operator()<std::linear_co
> > ngruential_engine<unsigned>
> > long, 16807ul, 0ul, 2147483647ul> > random.h:1809
> >
> > main ine.cpp:14
> > __libc_start_main +241
> > _start +4194346
> >
> > - 25.88% std::__detail::_Mod<unsigned long, 2147483647ul, 16807ul,
> > 0ul,
> >
> > true, true>::__calc random.h:143
> >
> > std::__detail::__mod<unsigned long, 2147483647ul, 16807ul, 0ul>
> >
> > random.h:151
> >
> > std::linear_congruential_engine<unsigned long, 16807ul, 0ul,
> >
> > 2147483647ul>::operator() random.h:332
> >
> > std::generate_canonical<double, 53ul,
> >
> > std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>
> > > random.tcc:3332
> >
> > std::__detail::_Adaptor<std::linear_congruential_engine<unsigned
> > long,
> >
> > 16807ul, 0ul, 2147483647ul>, double>::operator() random.h:185
> >
> > std::uniform_real_distribution<double>::operator()<std::linear_co
> > ngruential_engine<unsigned>
> > long, 16807ul, 0ul, 2147483647ul> > random.h:1818
> >
> > std::uniform_real_distribution<double>::operator()<std::linear_co
> > ngruential_engine<unsigned>
> > long, 16807ul, 0ul, 2147483647ul> > random.h:1809
> >
> > main ine.cpp:14
> > __libc_start_main +241
> > _start +4194346
> >
> > + 4.14% main inline.cpp:13
> > + 2.89%
> >
> > std::uniform_real_distribution<double>::operator()<std::linear_congruentia
> > l_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > random.h:1818
> >
> > + 0.68% std::_Norm_helper<true>::_S_do_it<double> complex:655
> > + 0.67% std::generate_canonical<double, 53ul,
> >
> > std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>
> > > random.tcc:3326
> > - 26.24% inline libm-2.24.so [.] __hypot_finite
> >
> > - __hypot_finite +179
> >
> > - 25.56% hypot +20
> >
> > std::__complex_abs complex:589
> > std::abs<double> complex:597
> > std::_Norm_helper<true>::_S_do_it<double> complex:654
> > std::norm<double> complex:664
> > main ine.cpp:14
> > __libc_start_main +241
> > _start +4194346
> >
> > 0.68% 0x40ead238ed1a2cf4
> >
> > - 6.85% inline libm-2.24.so [.] hypot
> >
> > hypot +40
> > std::__complex_abs complex:589
> > std::abs<double> complex:597
> > std::_Norm_helper<true>::_S_do_it<double> complex:654
> > std::norm<double> complex:664
> > main ine.cpp:14
> > __libc_start_main +241
> > _start +4194346
> >
> > ~~~~~~~~~~~~~~
> >
> > I hope you see the value of such a report versus what you are proposing.
> >
> > Thanks
--
Milian Wolff | milian.wolff@kdab.com | Software Engineer
KDAB (Deutschland) GmbH&Co KG, a KDAB Group company
Tel: +49-30-521325470
KDAB - The Qt Experts
[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5903 bytes --]
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Feedback on: [RESEND PATCH v2 0/5] perf report: Show inline stack
2017-01-10 15:03 ` Milian Wolff
@ 2017-01-11 1:01 ` Jin, Yao
2017-01-13 8:56 ` Milian Wolff
0 siblings, 1 reply; 9+ messages in thread
From: Jin, Yao @ 2017-01-11 1:01 UTC (permalink / raw)
To: Milian Wolff; +Cc: linux-perf-users
Hi Wolff,
See my comments below
Thanks
Jin Yao
On 1/10/2017 11:03 PM, Milian Wolff wrote:
> On Tuesday, January 10, 2017 9:53:04 AM CET Jin, Yao wrote:
>> Hi Wolff,
>>
>> I update the patch series with adding inline searching for each
>> callchain entry.
> Hey Yao,
>
> great! Is your git branch publicly accessible somewhere, so I can check it
> out?
Sorry, no public git branch yet, but I will send my patch series to you
in another mail thread.
>> Now the result tested on my skl machine is:
>>
>> # Overhead Command Shared Object Symbol
>> # ........ ....... ................... ..................................
>> #
>> 56.76% inline inline [.] main
>>
>> ---main
>> __libc_start_main
>> _start
> This is still wrong, no? Tons of stuff gets inlined into main and it seems as
> if your code is still not capable of displaying what that is. Please refer to
> the output I pasted from my patch in the previous mail. I'm explicitly looking
> for the cost of using <random> e.g.
That might be not an issue. I have checked rip of each callchain entry
and use "addr2line -e <binary> -i <rip>" to verify if it's in inline
function, same result.
There is probably different rip getting sampled or different sortkey
used. The previous example is:
perf report --no-children --stdio --inline
If I use the perf report --children --stdio --inline, the result is
# Children Self Command Shared Object Symbol
# ........ ........ ....... ...................
..................................
#
98.15% 56.76% inline inline [.] main
|
|--56.76%--_start
| __libc_start_main
| main
|
|
---/usr/include/c++/5/bits/random.h:151 (inline)
/usr/include/c++/5/bits/random.h:332 (inline)
/usr/include/c++/5/bits/random.tcc:3328 (inline)
/usr/include/c++/5/bits/random.h:185 (inline)
/usr/include/c++/5/bits/random.h:1818 (inline)
/usr/include/c++/5/bits/random.h:1809 (inline)
/home/jinyao/skl-ws/perf-dev/lck-2867/test/inline.cpp:14 (inline)
--41.39%--main
__hypot
|
--40.09%--__hypot_finite
Anyway, you can try my latest patch series and check what happens on
your machine.
>> 40.54% inline libm-2.23.so [.] __hypot_finite
>>
>> ---__hypot_finite
>>
>> --40.09%--__hypot
>>
>> ---/usr/include/c++/5/bits/random.h:151 (inline)
>> /usr/include/c++/5/bits/random.h:332 (inline)
>> /usr/include/c++/5/bits/random.tcc:3328 (inline)
>> /usr/include/c++/5/bits/random.h:185 (inline)
>> /usr/include/c++/5/bits/random.h:1818 (inline)
>> /usr/include/c++/5/bits/random.h:1809 (inline)
>> /home/jinyao/skl-ws/perf-dev/lck-2867/test/inline.cpp:14 (inline)
>> main
>> __libc_start_main
>> _start
> Much better, great work! But again I have to ask the same question:
>
> Why not integrate directly with the other call stack nodes? This would solve
> the following problems:
>
> - correct display of children cost for inlined frames, i.e. in a `perf report
> --children --inline`, I suspect your branch won't actually show any top-level
> entries for inline.cpp:14, correct?
>
> - display/aggregation: currently you are always displaying srcline in your
> approach, what if I want to display/aggregate by function symbol instead?
I just follow the behavior of addr2line. If I use the command line
"addr2line -e <binary> -i <rip>", it just display the srcline on my
machine.
> So, could we please discuss this please?
>
> Thanks a lot
>
>> On 1/9/2017 4:41 PM, Milian Wolff wrote:
>>> On Monday, January 9, 2017 4:17:48 PM CET Jin, Yao wrote:
>>>> Hi Wolff,
>>>>
>>>> Thanks a lot for your test code!
>>>>
>>>> I have updated my patch. The new output for your test code (binary name
>>>> is inline) is:
>>>>
>>>> # Overhead Command Shared Object Symbol
>>>> # ........ ....... ...................
>>>> ..................................
>>>> #
>>>>
>>>> 56.76% inline inline [.] main
>>>>
>>>> ---main
>>>>
>>>> __libc_start_main
>>>> _start
>>>>
>>>> 40.54% inline libm-2.23.so [.] __hypot_finite
>>>>
>>>> ---__hypot_finite
>>>>
>>>> --40.09%--__hypot
>>>>
>>>> main
>>>> __libc_start_main
>>>> _start
>>>>
>>>> 1.30% inline libm-2.23.so [.] __hypot
>>>>
>>>> ---__hypot
>>>>
>>>> main
>>>>
>>>> ---/usr/include/c++/5/bits/random.h:151 (inline)
>>>>
>>>> /usr/include/c++/5/bits/random.h:332 (inline)
>>>> /usr/include/c++/5/bits/random.tcc:3328 (inline)
>>>> /usr/include/c++/5/bits/random.h:185 (inline)
>>>> /usr/include/c++/5/bits/random.h:1818 (inline)
>>>> /usr/include/c++/5/bits/random.h:1809 (inline)
>>>>
>>>> /home/jinyao/skl-ws/perf-dev/lck-2867/test/inline.cpp:14 (inline)
>>>>
>>>> __libc_start_main
>>>> _start
>>>>
>>>> The inline entry is tagged with "(inline)". Is this result within
>>>> expectation?
>>> No, this is not the expected output. I expect to get information about the
>>> inlined frames in the middle of the call stack, in this example that
>>> happens for the first two major hotspots.
>>>
>>> Below is the output from my approach to this problem using my -g srcline:
>>>
>>> ~~~~~~~~~~~~~~ pasted here for readability:
>>> https://paste.kde.org/pmq3bc1le
>>> Samples: 149 of event 'cycles:ppp', Event count (approx.): 141334499
>>>
>>> Overhead Command Shared Object Symbol
>>>
>>> - 62.24% inline inline [.] main
>>>
>>> - 27.98% std::generate_canonical<double, 53ul,
>>>
>>> std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>
>>>> random.tcc:3332
>>> std::__detail::_Adaptor<std::linear_congruential_engine<unsigned
>>> long,
>>>
>>> 16807ul, 0ul, 2147483647ul>, double>::operator() random.h:185
>>>
>>> std::uniform_real_distribution<double>::operator()<std::linear_co
>>> ngruential_engine<unsigned>
>>> long, 16807ul, 0ul, 2147483647ul> > random.h:1818
>>>
>>> std::uniform_real_distribution<double>::operator()<std::linear_co
>>> ngruential_engine<unsigned>
>>> long, 16807ul, 0ul, 2147483647ul> > random.h:1809
>>>
>>> main ine.cpp:14
>>> __libc_start_main +241
>>> _start +4194346
>>>
>>> - 25.88% std::__detail::_Mod<unsigned long, 2147483647ul, 16807ul,
>>> 0ul,
>>>
>>> true, true>::__calc random.h:143
>>>
>>> std::__detail::__mod<unsigned long, 2147483647ul, 16807ul, 0ul>
>>>
>>> random.h:151
>>>
>>> std::linear_congruential_engine<unsigned long, 16807ul, 0ul,
>>>
>>> 2147483647ul>::operator() random.h:332
>>>
>>> std::generate_canonical<double, 53ul,
>>>
>>> std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>
>>>> random.tcc:3332
>>> std::__detail::_Adaptor<std::linear_congruential_engine<unsigned
>>> long,
>>>
>>> 16807ul, 0ul, 2147483647ul>, double>::operator() random.h:185
>>>
>>> std::uniform_real_distribution<double>::operator()<std::linear_co
>>> ngruential_engine<unsigned>
>>> long, 16807ul, 0ul, 2147483647ul> > random.h:1818
>>>
>>> std::uniform_real_distribution<double>::operator()<std::linear_co
>>> ngruential_engine<unsigned>
>>> long, 16807ul, 0ul, 2147483647ul> > random.h:1809
>>>
>>> main ine.cpp:14
>>> __libc_start_main +241
>>> _start +4194346
>>>
>>> + 4.14% main inline.cpp:13
>>> + 2.89%
>>>
>>> std::uniform_real_distribution<double>::operator()<std::linear_congruentia
>>> l_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > random.h:1818
>>>
>>> + 0.68% std::_Norm_helper<true>::_S_do_it<double> complex:655
>>> + 0.67% std::generate_canonical<double, 53ul,
>>>
>>> std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>
>>>> random.tcc:3326
>>> - 26.24% inline libm-2.24.so [.] __hypot_finite
>>>
>>> - __hypot_finite +179
>>>
>>> - 25.56% hypot +20
>>>
>>> std::__complex_abs complex:589
>>> std::abs<double> complex:597
>>> std::_Norm_helper<true>::_S_do_it<double> complex:654
>>> std::norm<double> complex:664
>>> main ine.cpp:14
>>> __libc_start_main +241
>>> _start +4194346
>>>
>>> 0.68% 0x40ead238ed1a2cf4
>>>
>>> - 6.85% inline libm-2.24.so [.] hypot
>>>
>>> hypot +40
>>> std::__complex_abs complex:589
>>> std::abs<double> complex:597
>>> std::_Norm_helper<true>::_S_do_it<double> complex:654
>>> std::norm<double> complex:664
>>> main ine.cpp:14
>>> __libc_start_main +241
>>> _start +4194346
>>>
>>> ~~~~~~~~~~~~~~
>>>
>>> I hope you see the value of such a report versus what you are proposing.
>>>
>>> Thanks
>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Feedback on: [RESEND PATCH v2 0/5] perf report: Show inline stack
2017-01-11 1:01 ` Jin, Yao
@ 2017-01-13 8:56 ` Milian Wolff
0 siblings, 0 replies; 9+ messages in thread
From: Milian Wolff @ 2017-01-13 8:56 UTC (permalink / raw)
To: Jin, Yao; +Cc: linux-perf-users
[-- Attachment #1: Type: text/plain, Size: 11686 bytes --]
On Wednesday, January 11, 2017 9:01:44 AM CET Jin, Yao wrote:
> Hi Wolff,
>
> See my comments below
>
> Thanks
Hey Yao, much appreciated!
> On 1/10/2017 11:03 PM, Milian Wolff wrote:
> > On Tuesday, January 10, 2017 9:53:04 AM CET Jin, Yao wrote:
> >> Hi Wolff,
> >>
> >> I update the patch series with adding inline searching for each
> >> callchain entry.
> >
> > Hey Yao,
> >
> > great! Is your git branch publicly accessible somewhere, so I can check it
> > out?
>
> Sorry, no public git branch yet, but I will send my patch series to you
> in another mail thread.
Perfect, I'll apply and test it later today, I think.
> >> Now the result tested on my skl machine is:
> >>
> >> # Overhead Command Shared Object Symbol
> >> # ........ ....... ...................
> >> ..................................
> >> #
> >>
> >> 56.76% inline inline [.] main
> >>
> >> ---main
> >>
> >> __libc_start_main
> >> _start
> >
> > This is still wrong, no? Tons of stuff gets inlined into main and it seems
> > as if your code is still not capable of displaying what that is. Please
> > refer to the output I pasted from my patch in the previous mail. I'm
> > explicitly looking for the cost of using <random> e.g.
>
> That might be not an issue. I have checked rip of each callchain entry
> and use "addr2line -e <binary> -i <rip>" to verify if it's in inline
> function, same result.
Ah, this may actually be the following issue:
https://lists.fedorahosted.org/archives/list/elfutils-devel@lists.fedorahosted.org/thread/X5XUOW4DQ6DNE5SL4QPG3MGGQPJUVSCW/
#G2L4JBSBRDZDZRPL6MF46SSPER5UME2G
I'll test and see if addr - 1 produces better inlining information, I have
applied this workaround in my branch. But, of course, we'll need to fix this
properly, based on the approach that elfutils is taking.
> There is probably different rip getting sampled or different sortkey
> used. The previous example is:
> perf report --no-children --stdio --inline
>
> If I use the perf report --children --stdio --inline, the result is
>
> # Children Self Command Shared Object Symbol
> # ........ ........ ....... ...................
> ..................................
> #
> 98.15% 56.76% inline inline [.] main
>
> |--56.76%--_start
> |
> | __libc_start_main
> | main
>
> ---/usr/include/c++/5/bits/random.h:151 (inline)
> /usr/include/c++/5/bits/random.h:332 (inline)
> /usr/include/c++/5/bits/random.tcc:3328 (inline)
> /usr/include/c++/5/bits/random.h:185 (inline)
> /usr/include/c++/5/bits/random.h:1818 (inline)
> /usr/include/c++/5/bits/random.h:1809 (inline)
> /home/jinyao/skl-ws/perf-dev/lck-2867/test/inline.cpp:14 (inline)
> --41.39%--main
> __hypot
>
> --40.09%--__hypot_finite
>
> Anyway, you can try my latest patch series and check what happens on
> your machine.
I will, thanks a lot again! Much appreciated.
Cheers
> >> 40.54% inline libm-2.23.so [.] __hypot_finite
> >>
> >> ---__hypot_finite
> >>
> >> --40.09%--__hypot
> >>
> >> ---/usr/include/c++/5/bits/random.h:151
> >> (inline)
> >>
> >> /usr/include/c++/5/bits/random.h:332
> >> (inline)
> >>
> >> /usr/include/c++/5/bits/random.tcc:3328 (inline)
> >>
> >> /usr/include/c++/5/bits/random.h:185
> >> (inline)
> >> /usr/include/c++/5/bits/random.h:1818
> >> (inline)
> >>
> >> /usr/include/c++/5/bits/random.h:1809 (inline)
> >> /home/jinyao/skl-ws/perf-dev/lck-2867/test/inline.cpp:14 (inline)
> >>
> >> main
> >> __libc_start_main
> >> _start
> >
> > Much better, great work! But again I have to ask the same question:
> >
> > Why not integrate directly with the other call stack nodes? This would
> > solve the following problems:
> >
> > - correct display of children cost for inlined frames, i.e. in a `perf
> > report --children --inline`, I suspect your branch won't actually show
> > any top-level entries for inline.cpp:14, correct?
> >
> > - display/aggregation: currently you are always displaying srcline in your
> > approach, what if I want to display/aggregate by function symbol instead?
>
> I just follow the behavior of addr2line. If I use the command line
> "addr2line -e <binary> -i <rip>", it just display the srcline on my
> machine.
>
> > So, could we please discuss this please?
> >
> > Thanks a lot
> >
> >> On 1/9/2017 4:41 PM, Milian Wolff wrote:
> >>> On Monday, January 9, 2017 4:17:48 PM CET Jin, Yao wrote:
> >>>> Hi Wolff,
> >>>>
> >>>> Thanks a lot for your test code!
> >>>>
> >>>> I have updated my patch. The new output for your test code (binary name
> >>>> is inline) is:
> >>>>
> >>>> # Overhead Command Shared Object Symbol
> >>>> # ........ ....... ...................
> >>>> ..................................
> >>>> #
> >>>>
> >>>> 56.76% inline inline [.] main
> >>>>
> >>>> ---main
> >>>>
> >>>> __libc_start_main
> >>>> _start
> >>>>
> >>>> 40.54% inline libm-2.23.so [.] __hypot_finite
> >>>>
> >>>> ---__hypot_finite
> >>>>
> >>>> --40.09%--__hypot
> >>>>
> >>>> main
> >>>> __libc_start_main
> >>>> _start
> >>>>
> >>>> 1.30% inline libm-2.23.so [.] __hypot
> >>>>
> >>>> ---__hypot
> >>>>
> >>>> main
> >>>>
> >>>> ---/usr/include/c++/5/bits/random.h:151 (inline)
> >>>>
> >>>> /usr/include/c++/5/bits/random.h:332 (inline)
> >>>> /usr/include/c++/5/bits/random.tcc:3328 (inline)
> >>>> /usr/include/c++/5/bits/random.h:185 (inline)
> >>>> /usr/include/c++/5/bits/random.h:1818 (inline)
> >>>> /usr/include/c++/5/bits/random.h:1809 (inline)
> >>>>
> >>>> /home/jinyao/skl-ws/perf-dev/lck-2867/test/inline.cpp:14 (inline)
> >>>>
> >>>> __libc_start_main
> >>>> _start
> >>>>
> >>>> The inline entry is tagged with "(inline)". Is this result within
> >>>> expectation?
> >>>
> >>> No, this is not the expected output. I expect to get information about
> >>> the
> >>> inlined frames in the middle of the call stack, in this example that
> >>> happens for the first two major hotspots.
> >>>
> >>> Below is the output from my approach to this problem using my -g
> >>> srcline:
> >>>
> >>> ~~~~~~~~~~~~~~ pasted here for readability:
> >>> https://paste.kde.org/pmq3bc1le
> >>> Samples: 149 of event 'cycles:ppp', Event count (approx.): 141334499
> >>>
> >>> Overhead Command Shared Object Symbol
> >>>
> >>> - 62.24% inline inline [.] main
> >>>
> >>> - 27.98% std::generate_canonical<double, 53ul,
> >>>
> >>> std::linear_congruential_engine<unsigned long, 16807ul, 0ul,
> >>> 2147483647ul>
> >>>
> >>>> random.tcc:3332
> >>>>
> >>> std::__detail::_Adaptor<std::linear_congruential_engine<unsign
> >>> ed
> >>> long,
> >>>
> >>> 16807ul, 0ul, 2147483647ul>, double>::operator() random.h:185
> >>>
> >>> std::uniform_real_distribution<double>::operator()<std::linear
> >>> _co
> >>> ngruential_engine<unsigned>
> >>>
> >>> long, 16807ul, 0ul, 2147483647ul> > random.h:1818
> >>>
> >>> std::uniform_real_distribution<double>::operator()<std::linear
> >>> _co
> >>> ngruential_engine<unsigned>
> >>>
> >>> long, 16807ul, 0ul, 2147483647ul> > random.h:1809
> >>>
> >>> main ine.cpp:14
> >>> __libc_start_main +241
> >>> _start +4194346
> >>>
> >>> - 25.88% std::__detail::_Mod<unsigned long, 2147483647ul, 16807ul,
> >>> 0ul,
> >>>
> >>> true, true>::__calc random.h:143
> >>>
> >>> std::__detail::__mod<unsigned long, 2147483647ul, 16807ul,
> >>> 0ul>
> >>>
> >>> random.h:151
> >>>
> >>> std::linear_congruential_engine<unsigned long, 16807ul, 0ul,
> >>>
> >>> 2147483647ul>::operator() random.h:332
> >>>
> >>> std::generate_canonical<double, 53ul,
> >>>
> >>> std::linear_congruential_engine<unsigned long, 16807ul, 0ul,
> >>> 2147483647ul>
> >>>
> >>>> random.tcc:3332
> >>>>
> >>> std::__detail::_Adaptor<std::linear_congruential_engine<unsign
> >>> ed
> >>> long,
> >>>
> >>> 16807ul, 0ul, 2147483647ul>, double>::operator() random.h:185
> >>>
> >>> std::uniform_real_distribution<double>::operator()<std::linear
> >>> _co
> >>> ngruential_engine<unsigned>
> >>>
> >>> long, 16807ul, 0ul, 2147483647ul> > random.h:1818
> >>>
> >>> std::uniform_real_distribution<double>::operator()<std::linear
> >>> _co
> >>> ngruential_engine<unsigned>
> >>>
> >>> long, 16807ul, 0ul, 2147483647ul> > random.h:1809
> >>>
> >>> main ine.cpp:14
> >>> __libc_start_main +241
> >>> _start +4194346
> >>>
> >>> + 4.14% main inline.cpp:13
> >>> + 2.89%
> >>>
> >>> std::uniform_real_distribution<double>::operator()<std::linear_congruent
> >>> ia
> >>> l_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > random.h:1818
> >>>
> >>> + 0.68% std::_Norm_helper<true>::_S_do_it<double> complex:655
> >>> + 0.67% std::generate_canonical<double, 53ul,
> >>>
> >>> std::linear_congruential_engine<unsigned long, 16807ul, 0ul,
> >>> 2147483647ul>
> >>>
> >>>> random.tcc:3326
> >>>
> >>> - 26.24% inline libm-2.24.so [.] __hypot_finite
> >>>
> >>> - __hypot_finite +179
> >>>
> >>> - 25.56% hypot +20
> >>>
> >>> std::__complex_abs complex:589
> >>> std::abs<double> complex:597
> >>> std::_Norm_helper<true>::_S_do_it<double> complex:654
> >>> std::norm<double> complex:664
> >>> main ine.cpp:14
> >>> __libc_start_main +241
> >>> _start +4194346
> >>>
> >>> 0.68% 0x40ead238ed1a2cf4
> >>>
> >>> - 6.85% inline libm-2.24.so [.] hypot
> >>>
> >>> hypot +40
> >>> std::__complex_abs complex:589
> >>> std::abs<double> complex:597
> >>> std::_Norm_helper<true>::_S_do_it<double> complex:654
> >>> std::norm<double> complex:664
> >>> main ine.cpp:14
> >>> __libc_start_main +241
> >>> _start +4194346
> >>>
> >>> ~~~~~~~~~~~~~~
> >>>
> >>> I hope you see the value of such a report versus what you are proposing.
> >>>
> >>> Thanks
--
Milian Wolff | milian.wolff@kdab.com | Software Engineer
KDAB (Deutschland) GmbH&Co KG, a KDAB Group company
Tel: +49-30-521325470
KDAB - The Qt Experts
[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5903 bytes --]
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2017-01-13 8:56 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-01-07 20:13 Feedback on: [RESEND PATCH v2 0/5] perf report: Show inline stack Milian Wolff
2017-01-09 3:07 ` Jin, Yao
2017-01-09 8:02 ` Milian Wolff
2017-01-09 8:17 ` Jin, Yao
2017-01-09 8:41 ` Milian Wolff
2017-01-10 1:53 ` Jin, Yao
2017-01-10 15:03 ` Milian Wolff
2017-01-11 1:01 ` Jin, Yao
2017-01-13 8:56 ` Milian Wolff
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).