* Feedback on: [RESEND PATCH v2 0/5] perf report: Show inline stack @ 2017-01-07 20:13 Milian Wolff 2017-01-09 3:07 ` Jin, Yao 0 siblings, 1 reply; 9+ messages in thread From: Milian Wolff @ 2017-01-07 20:13 UTC (permalink / raw) To: Jin, Yao; +Cc: linux-perf-users Hey Jin, I have finally had the time to test your patch series. Sadly, I hit a fundamental issue with it early on in my testing: In my test case, I get with `perf report --no-children` e.g. the following: 42.50% lab_mandelbrot libm-2.24.so [.] __hypot_finite | ---__hypot_finite hypot main __libc_start_main _start Enabling --inline does not change the output, even though hypot is inlined into main. Looking at your code, it looks to me as if you are only trying to find inliners for root entries, cf. hist_entry__fprintf (stdio/hist.c), and only when unwinding failed (?, cf. (callchain_ret == 0)). Is this understanding correct, or should your patch also add inliners for nodes in the middle of the callstack? This approach makes the whole feature unusable for my use case. In my (really hacky) approach to adding inliners to the output of perf (https://github.com/ milianw/linux/commit/71d031c9d679bfb4a4044226e8903dd80ea601b3) I decided to lookup inliners for _every_ callchain entry. The result then there is with my branch and a `perf report -g srcline --stdio`: 42.50% lab_mandelbrot libm-2.24.so [.] __hypot_finite | ---__hypot_finite +387 hypot +20 std::__complex_abs complex:589 std::abs<double> complex:597 drawMandelbrot mandelbrot.h:39 main main.cpp:55 __libc_start_main +241 _start +42 Note the three inliner entries in the middle of the callstack. The way I did it, one will also see "proper" entries for inlined functions in a top-down `perf report --children` output. In general, your patch series looks much more polished than mine, so I would love to get it integrated. But with your approach, we will need lots of changes to the various browsers to get all features in. My hacky approach has the nice property that it simply adds "normal" callchain entries for inlined nodes - maybe you should do the same? Thanks a lot for your work, much appreciated. -- Milian Wolff | milian.wolff@kdab.com | Software Engineer KDAB (Deutschland) GmbH&Co KG, a KDAB Group company Tel: +49-30-521325470 KDAB - The Qt Experts ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Feedback on: [RESEND PATCH v2 0/5] perf report: Show inline stack 2017-01-07 20:13 Feedback on: [RESEND PATCH v2 0/5] perf report: Show inline stack Milian Wolff @ 2017-01-09 3:07 ` Jin, Yao 2017-01-09 8:02 ` Milian Wolff 0 siblings, 1 reply; 9+ messages in thread From: Jin, Yao @ 2017-01-09 3:07 UTC (permalink / raw) To: Milian Wolff; +Cc: linux-perf-users Hi Wolff, Thanks a lot for your feedback. Have you tested with the browser mode? I mean without the "--stdio" option. Could you give your test binary then I can try to reproduce the issue? I have tried with my tests but I could not reproduce. Thanks Jin Yao On 1/8/2017 4:13 AM, Milian Wolff wrote: > Hey Jin, > > I have finally had the time to test your patch series. Sadly, I hit a > fundamental issue with it early on in my testing: > > In my test case, I get with `perf report --no-children` e.g. the following: > > 42.50% lab_mandelbrot libm-2.24.so [.] __hypot_finite > | > ---__hypot_finite > hypot > main > __libc_start_main > _start > > Enabling --inline does not change the output, even though hypot is inlined > into main. > > Looking at your code, it looks to me as if you are only trying to find > inliners for root entries, cf. hist_entry__fprintf (stdio/hist.c), and only > when unwinding failed (?, cf. (callchain_ret == 0)). > > Is this understanding correct, or should your patch also add inliners for > nodes in the middle of the callstack? > > This approach makes the whole feature unusable for my use case. In my (really > hacky) approach to adding inliners to the output of perf (https://github.com/ > milianw/linux/commit/71d031c9d679bfb4a4044226e8903dd80ea601b3) I decided to > lookup inliners for _every_ callchain entry. The result then there is with my > branch and a `perf report -g srcline --stdio`: > > 42.50% lab_mandelbrot libm-2.24.so [.] __hypot_finite > | > ---__hypot_finite +387 > hypot +20 > std::__complex_abs complex:589 > std::abs<double> complex:597 > drawMandelbrot mandelbrot.h:39 > main main.cpp:55 > __libc_start_main +241 > _start +42 > > Note the three inliner entries in the middle of the callstack. The way I did > it, one will also see "proper" entries for inlined functions in a top-down > `perf report --children` output. > > In general, your patch series looks much more polished than mine, so I would > love to get it integrated. But with your approach, we will need lots of > changes to the various browsers to get all features in. My hacky approach has > the nice property that it simply adds "normal" callchain entries for inlined > nodes - maybe you should do the same? > > Thanks a lot for your work, much appreciated. > ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Feedback on: [RESEND PATCH v2 0/5] perf report: Show inline stack 2017-01-09 3:07 ` Jin, Yao @ 2017-01-09 8:02 ` Milian Wolff 2017-01-09 8:17 ` Jin, Yao 0 siblings, 1 reply; 9+ messages in thread From: Milian Wolff @ 2017-01-09 8:02 UTC (permalink / raw) To: Jin, Yao; +Cc: linux-perf-users [-- Attachment #1: Type: text/plain, Size: 6725 bytes --] On Monday, January 9, 2017 11:07:26 AM CET Jin, Yao wrote: > Hi Wolff, > > Thanks a lot for your feedback. > > Have you tested with the browser mode? I mean without the "--stdio" option. Yes, I have done that: ~~~~~~~~~~~~ milian@milian-kdab2:/tmp$ g++ -O2 -g inline.cpp -o inline milian@milian-kdab2:/tmp$ perf record --call-graph dwarf ./inline 6.66016e+15 [ perf record: Woken up 5 times to write data ] [ perf record: Captured and wrote 1.215 MB perf.data (149 samples) ] ~~~~~~~~~~~~ Both with and without your patch (and passing --inline) I get: ~~~~~~~~~~~~ milian@milian-kdab2:/tmp$ perf report --no-children --inline Samples: 149 of event 'cycles:ppp', Event count (approx.): 141334499 Overhead Command Shared Object Symbol - 62.24% inline inline [.] main main __libc_start_main _start - 26.24% inline libm-2.24.so [.] __hypot_finite - __hypot_finite - 25.56% hypot main __libc_start_main _start 0.68% 0x40ead238ed1a2cf4 - 6.85% inline libm-2.24.so [.] hypot hypot main __libc_start_main _start ~~~~~~~~~~~~ > Could you give your test binary then I can try to reproduce the issue? I > have tried with my tests but I could not reproduce. The following code should reproduce the issue: ~~~~~~~~~~~~ #include <complex> #include <cmath> #include <random> #include <iostream> using namespace std; int main() { uniform_real_distribution<double> uniform(-1E5, 1E5); default_random_engine engine; double s = 0; for (int i = 0; i < 1000000; ++i) { s += norm(complex<double>(uniform(engine), uniform(engine))); } cout << s << '\n'; return 0; } ~~~~~~~~~~~~ Hope this helps > On 1/8/2017 4:13 AM, Milian Wolff wrote: > > Hey Jin, > > > > I have finally had the time to test your patch series. Sadly, I hit a > > fundamental issue with it early on in my testing: > > > > In my test case, I get with `perf report --no-children` e.g. the following: > > 42.50% lab_mandelbrot libm-2.24.so [.] > > __hypot_finite > > > > ---__hypot_finite > > > > hypot > > main > > __libc_start_main > > _start > > > > Enabling --inline does not change the output, even though hypot is inlined > > into main. > > > > Looking at your code, it looks to me as if you are only trying to find > > inliners for root entries, cf. hist_entry__fprintf (stdio/hist.c), and > > only > > when unwinding failed (?, cf. (callchain_ret == 0)). > > > > Is this understanding correct, or should your patch also add inliners for > > nodes in the middle of the callstack? > > > > This approach makes the whole feature unusable for my use case. In my > > (really hacky) approach to adding inliners to the output of perf > > (https://github.com/ > > milianw/linux/commit/71d031c9d679bfb4a4044226e8903dd80ea601b3) I decided > > to lookup inliners for _every_ callchain entry. The result then there is > > with my> > > branch and a `perf report -g srcline --stdio`: > > 42.50% lab_mandelbrot libm-2.24.so [.] > > __hypot_finite > > > > ---__hypot_finite +387 > > > > hypot +20 > > std::__complex_abs complex:589 > > std::abs<double> complex:597 > > drawMandelbrot mandelbrot.h:39 > > main main.cpp:55 > > __libc_start_main +241 > > _start +42 > > > > Note the three inliner entries in the middle of the callstack. The way I > > did it, one will also see "proper" entries for inlined functions in a > > top-down `perf report --children` output. > > > > In general, your patch series looks much more polished than mine, so I > > would love to get it integrated. But with your approach, we will need > > lots of changes to the various browsers to get all features in. My hacky > > approach has the nice property that it simply adds "normal" callchain > > entries for inlined nodes - maybe you should do the same? > > > > Thanks a lot for your work, much appreciated. -- Milian Wolff | milian.wolff@kdab.com | Software Engineer KDAB (Deutschland) GmbH&Co KG, a KDAB Group company Tel: +49-30-521325470 KDAB - The Qt Experts [-- Attachment #2: smime.p7s --] [-- Type: application/pkcs7-signature, Size: 5903 bytes --] ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Feedback on: [RESEND PATCH v2 0/5] perf report: Show inline stack 2017-01-09 8:02 ` Milian Wolff @ 2017-01-09 8:17 ` Jin, Yao 2017-01-09 8:41 ` Milian Wolff 0 siblings, 1 reply; 9+ messages in thread From: Jin, Yao @ 2017-01-09 8:17 UTC (permalink / raw) To: Milian Wolff; +Cc: linux-perf-users Hi Wolff, Thanks a lot for your test code! I have updated my patch. The new output for your test code (binary name is inline) is: # Overhead Command Shared Object Symbol # ........ ....... ................... .................................. # 56.76% inline inline [.] main | ---main __libc_start_main _start 40.54% inline libm-2.23.so [.] __hypot_finite | ---__hypot_finite | --40.09%--__hypot main __libc_start_main _start 1.30% inline libm-2.23.so [.] __hypot | ---__hypot main | ---/usr/include/c++/5/bits/random.h:151 (inline) /usr/include/c++/5/bits/random.h:332 (inline) /usr/include/c++/5/bits/random.tcc:3328 (inline) /usr/include/c++/5/bits/random.h:185 (inline) /usr/include/c++/5/bits/random.h:1818 (inline) /usr/include/c++/5/bits/random.h:1809 (inline) /home/jinyao/skl-ws/perf-dev/lck-2867/test/inline.cpp:14 (inline) __libc_start_main _start The inline entry is tagged with "(inline)". Is this result within expectation? Thanks Jin Yao On 1/9/2017 4:02 PM, Milian Wolff wrote: > On Monday, January 9, 2017 11:07:26 AM CET Jin, Yao wrote: >> Hi Wolff, >> >> Thanks a lot for your feedback. >> >> Have you tested with the browser mode? I mean without the "--stdio" option. > Yes, I have done that: > > ~~~~~~~~~~~~ > milian@milian-kdab2:/tmp$ g++ -O2 -g inline.cpp -o inline > milian@milian-kdab2:/tmp$ perf record --call-graph dwarf ./inline > 6.66016e+15 > [ perf record: Woken up 5 times to write data ] > [ perf record: Captured and wrote 1.215 MB perf.data (149 samples) ] > ~~~~~~~~~~~~ > > Both with and without your patch (and passing --inline) I get: > > ~~~~~~~~~~~~ > milian@milian-kdab2:/tmp$ perf report --no-children --inline > Samples: 149 of event 'cycles:ppp', Event count (approx.): 141334499 > Overhead Command Shared Object Symbol > - 62.24% inline inline [.] main > main > __libc_start_main > _start > - 26.24% inline libm-2.24.so [.] __hypot_finite > - __hypot_finite > - 25.56% hypot > main > __libc_start_main > _start > 0.68% 0x40ead238ed1a2cf4 > - 6.85% inline libm-2.24.so [.] hypot > hypot > main > __libc_start_main > _start > ~~~~~~~~~~~~ > >> Could you give your test binary then I can try to reproduce the issue? I >> have tried with my tests but I could not reproduce. > The following code should reproduce the issue: > > ~~~~~~~~~~~~ > #include <complex> > #include <cmath> > #include <random> > #include <iostream> > > using namespace std; > > int main() > { > uniform_real_distribution<double> uniform(-1E5, 1E5); > default_random_engine engine; > double s = 0; > for (int i = 0; i < 1000000; ++i) { > s += norm(complex<double>(uniform(engine), uniform(engine))); > } > cout << s << '\n'; > return 0; > } > ~~~~~~~~~~~~ > > Hope this helps > >> On 1/8/2017 4:13 AM, Milian Wolff wrote: >>> Hey Jin, >>> >>> I have finally had the time to test your patch series. Sadly, I hit a >>> fundamental issue with it early on in my testing: >>> >>> In my test case, I get with `perf report --no-children` e.g. the > following: >>> 42.50% lab_mandelbrot libm-2.24.so [.] >>> __hypot_finite >>> >>> ---__hypot_finite >>> >>> hypot >>> main >>> __libc_start_main >>> _start >>> >>> Enabling --inline does not change the output, even though hypot is inlined >>> into main. >>> >>> Looking at your code, it looks to me as if you are only trying to find >>> inliners for root entries, cf. hist_entry__fprintf (stdio/hist.c), and >>> only >>> when unwinding failed (?, cf. (callchain_ret == 0)). >>> >>> Is this understanding correct, or should your patch also add inliners for >>> nodes in the middle of the callstack? >>> >>> This approach makes the whole feature unusable for my use case. In my >>> (really hacky) approach to adding inliners to the output of perf >>> (https://github.com/ >>> milianw/linux/commit/71d031c9d679bfb4a4044226e8903dd80ea601b3) I decided >>> to lookup inliners for _every_ callchain entry. The result then there is >>> with my> >>> branch and a `perf report -g srcline --stdio`: >>> 42.50% lab_mandelbrot libm-2.24.so [.] >>> __hypot_finite >>> >>> ---__hypot_finite +387 >>> >>> hypot +20 >>> std::__complex_abs complex:589 >>> std::abs<double> complex:597 >>> drawMandelbrot mandelbrot.h:39 >>> main main.cpp:55 >>> __libc_start_main +241 >>> _start +42 >>> >>> Note the three inliner entries in the middle of the callstack. The way I >>> did it, one will also see "proper" entries for inlined functions in a >>> top-down `perf report --children` output. >>> >>> In general, your patch series looks much more polished than mine, so I >>> would love to get it integrated. But with your approach, we will need >>> lots of changes to the various browsers to get all features in. My hacky >>> approach has the nice property that it simply adds "normal" callchain >>> entries for inlined nodes - maybe you should do the same? >>> >>> Thanks a lot for your work, much appreciated. > ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Feedback on: [RESEND PATCH v2 0/5] perf report: Show inline stack 2017-01-09 8:17 ` Jin, Yao @ 2017-01-09 8:41 ` Milian Wolff 2017-01-10 1:53 ` Jin, Yao 0 siblings, 1 reply; 9+ messages in thread From: Milian Wolff @ 2017-01-09 8:41 UTC (permalink / raw) To: Jin, Yao; +Cc: linux-perf-users [-- Attachment #1: Type: text/plain, Size: 4980 bytes --] On Monday, January 9, 2017 4:17:48 PM CET Jin, Yao wrote: > Hi Wolff, > > Thanks a lot for your test code! > > I have updated my patch. The new output for your test code (binary name > is inline) is: > > # Overhead Command Shared Object Symbol > # ........ ....... ................... .................................. > # > 56.76% inline inline [.] main > > ---main > __libc_start_main > _start > > 40.54% inline libm-2.23.so [.] __hypot_finite > > ---__hypot_finite > > --40.09%--__hypot > main > __libc_start_main > _start > > 1.30% inline libm-2.23.so [.] __hypot > > ---__hypot > main > > ---/usr/include/c++/5/bits/random.h:151 (inline) > /usr/include/c++/5/bits/random.h:332 (inline) > /usr/include/c++/5/bits/random.tcc:3328 (inline) > /usr/include/c++/5/bits/random.h:185 (inline) > /usr/include/c++/5/bits/random.h:1818 (inline) > /usr/include/c++/5/bits/random.h:1809 (inline) > /home/jinyao/skl-ws/perf-dev/lck-2867/test/inline.cpp:14 (inline) > __libc_start_main > _start > > The inline entry is tagged with "(inline)". Is this result within > expectation? No, this is not the expected output. I expect to get information about the inlined frames in the middle of the call stack, in this example that happens for the first two major hotspots. Below is the output from my approach to this problem using my -g srcline: ~~~~~~~~~~~~~~ pasted here for readability: https://paste.kde.org/pmq3bc1le Samples: 149 of event 'cycles:ppp', Event count (approx.): 141334499 Overhead Command Shared Object Symbol - 62.24% inline inline [.] main - 27.98% std::generate_canonical<double, 53ul, std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > random.tcc:3332 std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>, double>::operator() random.h:185 std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > random.h:1818 std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > random.h:1809 main ine.cpp:14 __libc_start_main +241 _start +4194346 - 25.88% std::__detail::_Mod<unsigned long, 2147483647ul, 16807ul, 0ul, true, true>::__calc random.h:143 std::__detail::__mod<unsigned long, 2147483647ul, 16807ul, 0ul> random.h:151 std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>::operator() random.h:332 std::generate_canonical<double, 53ul, std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > random.tcc:3332 std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>, double>::operator() random.h:185 std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > random.h:1818 std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > random.h:1809 main ine.cpp:14 __libc_start_main +241 _start +4194346 + 4.14% main inline.cpp:13 + 2.89% std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > random.h:1818 + 0.68% std::_Norm_helper<true>::_S_do_it<double> complex:655 + 0.67% std::generate_canonical<double, 53ul, std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > random.tcc:3326 - 26.24% inline libm-2.24.so [.] __hypot_finite - __hypot_finite +179 - 25.56% hypot +20 std::__complex_abs complex:589 std::abs<double> complex:597 std::_Norm_helper<true>::_S_do_it<double> complex:654 std::norm<double> complex:664 main ine.cpp:14 __libc_start_main +241 _start +4194346 0.68% 0x40ead238ed1a2cf4 - 6.85% inline libm-2.24.so [.] hypot hypot +40 std::__complex_abs complex:589 std::abs<double> complex:597 std::_Norm_helper<true>::_S_do_it<double> complex:654 std::norm<double> complex:664 main ine.cpp:14 __libc_start_main +241 _start +4194346 ~~~~~~~~~~~~~~ I hope you see the value of such a report versus what you are proposing. Thanks -- Milian Wolff | milian.wolff@kdab.com | Software Engineer KDAB (Deutschland) GmbH&Co KG, a KDAB Group company Tel: +49-30-521325470 KDAB - The Qt Experts [-- Attachment #2: smime.p7s --] [-- Type: application/pkcs7-signature, Size: 5903 bytes --] ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Feedback on: [RESEND PATCH v2 0/5] perf report: Show inline stack 2017-01-09 8:41 ` Milian Wolff @ 2017-01-10 1:53 ` Jin, Yao 2017-01-10 15:03 ` Milian Wolff 0 siblings, 1 reply; 9+ messages in thread From: Jin, Yao @ 2017-01-10 1:53 UTC (permalink / raw) To: Milian Wolff; +Cc: linux-perf-users Hi Wolff, I update the patch series with adding inline searching for each callchain entry. Now the result tested on my skl machine is: # Overhead Command Shared Object Symbol # ........ ....... ................... .................................. # 56.76% inline inline [.] main | ---main __libc_start_main _start 40.54% inline libm-2.23.so [.] __hypot_finite | ---__hypot_finite | --40.09%--__hypot | ---/usr/include/c++/5/bits/random.h:151 (inline) /usr/include/c++/5/bits/random.h:332 (inline) /usr/include/c++/5/bits/random.tcc:3328 (inline) /usr/include/c++/5/bits/random.h:185 (inline) /usr/include/c++/5/bits/random.h:1818 (inline) /usr/include/c++/5/bits/random.h:1809 (inline) /home/jinyao/skl-ws/perf-dev/lck-2867/test/inline.cpp:14 (inline) main __libc_start_main _start 1.30% inline libm-2.23.so [.] __hypot | ---__hypot | ---/usr/include/c++/5/bits/random.h:151 (inline) /usr/include/c++/5/bits/random.h:332 (inline) /usr/include/c++/5/bits/random.tcc:3328 (inline) /usr/include/c++/5/bits/random.h:185 (inline) /usr/include/c++/5/bits/random.h:1818 (inline) /usr/include/c++/5/bits/random.h:1809 (inline) /home/jinyao/skl-ws/perf-dev/lck-2867/test/inline.cpp:14 (inline) main __libc_start_main _start Thanks Jin Yao On 1/9/2017 4:41 PM, Milian Wolff wrote: > On Monday, January 9, 2017 4:17:48 PM CET Jin, Yao wrote: >> Hi Wolff, >> >> Thanks a lot for your test code! >> >> I have updated my patch. The new output for your test code (binary name >> is inline) is: >> >> # Overhead Command Shared Object Symbol >> # ........ ....... ................... .................................. >> # >> 56.76% inline inline [.] main >> >> ---main >> __libc_start_main >> _start >> >> 40.54% inline libm-2.23.so [.] __hypot_finite >> >> ---__hypot_finite >> >> --40.09%--__hypot >> main >> __libc_start_main >> _start >> >> 1.30% inline libm-2.23.so [.] __hypot >> >> ---__hypot >> main >> >> ---/usr/include/c++/5/bits/random.h:151 (inline) >> /usr/include/c++/5/bits/random.h:332 (inline) >> /usr/include/c++/5/bits/random.tcc:3328 (inline) >> /usr/include/c++/5/bits/random.h:185 (inline) >> /usr/include/c++/5/bits/random.h:1818 (inline) >> /usr/include/c++/5/bits/random.h:1809 (inline) >> /home/jinyao/skl-ws/perf-dev/lck-2867/test/inline.cpp:14 (inline) >> __libc_start_main >> _start >> >> The inline entry is tagged with "(inline)". Is this result within >> expectation? > No, this is not the expected output. I expect to get information about the > inlined frames in the middle of the call stack, in this example that happens > for the first two major hotspots. > > Below is the output from my approach to this problem using my -g srcline: > > ~~~~~~~~~~~~~~ pasted here for readability: https://paste.kde.org/pmq3bc1le > Samples: 149 of event 'cycles:ppp', Event count (approx.): 141334499 > Overhead Command Shared Object Symbol > - 62.24% inline inline [.] main > - 27.98% std::generate_canonical<double, 53ul, > std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > > random.tcc:3332 > std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long, > 16807ul, 0ul, 2147483647ul>, double>::operator() random.h:185 > std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned > long, 16807ul, 0ul, 2147483647ul> > random.h:1818 > std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned > long, 16807ul, 0ul, 2147483647ul> > random.h:1809 > main ine.cpp:14 > __libc_start_main +241 > _start +4194346 > - 25.88% std::__detail::_Mod<unsigned long, 2147483647ul, 16807ul, 0ul, > true, true>::__calc random.h:143 > std::__detail::__mod<unsigned long, 2147483647ul, 16807ul, 0ul> > random.h:151 > std::linear_congruential_engine<unsigned long, 16807ul, 0ul, > 2147483647ul>::operator() random.h:332 > std::generate_canonical<double, 53ul, > std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > > random.tcc:3332 > std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long, > 16807ul, 0ul, 2147483647ul>, double>::operator() random.h:185 > std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned > long, 16807ul, 0ul, 2147483647ul> > random.h:1818 > std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned > long, 16807ul, 0ul, 2147483647ul> > random.h:1809 > main ine.cpp:14 > __libc_start_main +241 > _start +4194346 > + 4.14% main inline.cpp:13 > + 2.89% > std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned > long, 16807ul, 0ul, 2147483647ul> > random.h:1818 > + 0.68% std::_Norm_helper<true>::_S_do_it<double> complex:655 > + 0.67% std::generate_canonical<double, 53ul, > std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > > random.tcc:3326 > - 26.24% inline libm-2.24.so [.] __hypot_finite > - __hypot_finite +179 > - 25.56% hypot +20 > std::__complex_abs complex:589 > std::abs<double> complex:597 > std::_Norm_helper<true>::_S_do_it<double> complex:654 > std::norm<double> complex:664 > main ine.cpp:14 > __libc_start_main +241 > _start +4194346 > 0.68% 0x40ead238ed1a2cf4 > - 6.85% inline libm-2.24.so [.] hypot > hypot +40 > std::__complex_abs complex:589 > std::abs<double> complex:597 > std::_Norm_helper<true>::_S_do_it<double> complex:654 > std::norm<double> complex:664 > main ine.cpp:14 > __libc_start_main +241 > _start +4194346 > ~~~~~~~~~~~~~~ > > I hope you see the value of such a report versus what you are proposing. > > Thanks > ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Feedback on: [RESEND PATCH v2 0/5] perf report: Show inline stack 2017-01-10 1:53 ` Jin, Yao @ 2017-01-10 15:03 ` Milian Wolff 2017-01-11 1:01 ` Jin, Yao 0 siblings, 1 reply; 9+ messages in thread From: Milian Wolff @ 2017-01-10 15:03 UTC (permalink / raw) To: Jin, Yao; +Cc: linux-perf-users [-- Attachment #1: Type: text/plain, Size: 8129 bytes --] On Tuesday, January 10, 2017 9:53:04 AM CET Jin, Yao wrote: > Hi Wolff, > > I update the patch series with adding inline searching for each > callchain entry. Hey Yao, great! Is your git branch publicly accessible somewhere, so I can check it out? > Now the result tested on my skl machine is: > > # Overhead Command Shared Object Symbol > # ........ ....... ................... .................................. > # > 56.76% inline inline [.] main > > ---main > __libc_start_main > _start This is still wrong, no? Tons of stuff gets inlined into main and it seems as if your code is still not capable of displaying what that is. Please refer to the output I pasted from my patch in the previous mail. I'm explicitly looking for the cost of using <random> e.g. > 40.54% inline libm-2.23.so [.] __hypot_finite > > ---__hypot_finite > > --40.09%--__hypot > > ---/usr/include/c++/5/bits/random.h:151 (inline) > /usr/include/c++/5/bits/random.h:332 (inline) > /usr/include/c++/5/bits/random.tcc:3328 (inline) > /usr/include/c++/5/bits/random.h:185 (inline) > /usr/include/c++/5/bits/random.h:1818 (inline) > /usr/include/c++/5/bits/random.h:1809 (inline) > /home/jinyao/skl-ws/perf-dev/lck-2867/test/inline.cpp:14 (inline) > main > __libc_start_main > _start Much better, great work! But again I have to ask the same question: Why not integrate directly with the other call stack nodes? This would solve the following problems: - correct display of children cost for inlined frames, i.e. in a `perf report --children --inline`, I suspect your branch won't actually show any top-level entries for inline.cpp:14, correct? - display/aggregation: currently you are always displaying srcline in your approach, what if I want to display/aggregate by function symbol instead? So, could we please discuss this please? Thanks a lot > On 1/9/2017 4:41 PM, Milian Wolff wrote: > > On Monday, January 9, 2017 4:17:48 PM CET Jin, Yao wrote: > >> Hi Wolff, > >> > >> Thanks a lot for your test code! > >> > >> I have updated my patch. The new output for your test code (binary name > >> is inline) is: > >> > >> # Overhead Command Shared Object Symbol > >> # ........ ....... ................... > >> .................................. > >> # > >> > >> 56.76% inline inline [.] main > >> > >> ---main > >> > >> __libc_start_main > >> _start > >> > >> 40.54% inline libm-2.23.so [.] __hypot_finite > >> > >> ---__hypot_finite > >> > >> --40.09%--__hypot > >> > >> main > >> __libc_start_main > >> _start > >> > >> 1.30% inline libm-2.23.so [.] __hypot > >> > >> ---__hypot > >> > >> main > >> > >> ---/usr/include/c++/5/bits/random.h:151 (inline) > >> > >> /usr/include/c++/5/bits/random.h:332 (inline) > >> /usr/include/c++/5/bits/random.tcc:3328 (inline) > >> /usr/include/c++/5/bits/random.h:185 (inline) > >> /usr/include/c++/5/bits/random.h:1818 (inline) > >> /usr/include/c++/5/bits/random.h:1809 (inline) > >> > >> /home/jinyao/skl-ws/perf-dev/lck-2867/test/inline.cpp:14 (inline) > >> > >> __libc_start_main > >> _start > >> > >> The inline entry is tagged with "(inline)". Is this result within > >> expectation? > > > > No, this is not the expected output. I expect to get information about the > > inlined frames in the middle of the call stack, in this example that > > happens for the first two major hotspots. > > > > Below is the output from my approach to this problem using my -g srcline: > > > > ~~~~~~~~~~~~~~ pasted here for readability: > > https://paste.kde.org/pmq3bc1le > > Samples: 149 of event 'cycles:ppp', Event count (approx.): 141334499 > > > > Overhead Command Shared Object Symbol > > > > - 62.24% inline inline [.] main > > > > - 27.98% std::generate_canonical<double, 53ul, > > > > std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > > > random.tcc:3332 > > > > std::__detail::_Adaptor<std::linear_congruential_engine<unsigned > > long, > > > > 16807ul, 0ul, 2147483647ul>, double>::operator() random.h:185 > > > > std::uniform_real_distribution<double>::operator()<std::linear_co > > ngruential_engine<unsigned> > > long, 16807ul, 0ul, 2147483647ul> > random.h:1818 > > > > std::uniform_real_distribution<double>::operator()<std::linear_co > > ngruential_engine<unsigned> > > long, 16807ul, 0ul, 2147483647ul> > random.h:1809 > > > > main ine.cpp:14 > > __libc_start_main +241 > > _start +4194346 > > > > - 25.88% std::__detail::_Mod<unsigned long, 2147483647ul, 16807ul, > > 0ul, > > > > true, true>::__calc random.h:143 > > > > std::__detail::__mod<unsigned long, 2147483647ul, 16807ul, 0ul> > > > > random.h:151 > > > > std::linear_congruential_engine<unsigned long, 16807ul, 0ul, > > > > 2147483647ul>::operator() random.h:332 > > > > std::generate_canonical<double, 53ul, > > > > std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > > > random.tcc:3332 > > > > std::__detail::_Adaptor<std::linear_congruential_engine<unsigned > > long, > > > > 16807ul, 0ul, 2147483647ul>, double>::operator() random.h:185 > > > > std::uniform_real_distribution<double>::operator()<std::linear_co > > ngruential_engine<unsigned> > > long, 16807ul, 0ul, 2147483647ul> > random.h:1818 > > > > std::uniform_real_distribution<double>::operator()<std::linear_co > > ngruential_engine<unsigned> > > long, 16807ul, 0ul, 2147483647ul> > random.h:1809 > > > > main ine.cpp:14 > > __libc_start_main +241 > > _start +4194346 > > > > + 4.14% main inline.cpp:13 > > + 2.89% > > > > std::uniform_real_distribution<double>::operator()<std::linear_congruentia > > l_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > random.h:1818 > > > > + 0.68% std::_Norm_helper<true>::_S_do_it<double> complex:655 > > + 0.67% std::generate_canonical<double, 53ul, > > > > std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > > > random.tcc:3326 > > - 26.24% inline libm-2.24.so [.] __hypot_finite > > > > - __hypot_finite +179 > > > > - 25.56% hypot +20 > > > > std::__complex_abs complex:589 > > std::abs<double> complex:597 > > std::_Norm_helper<true>::_S_do_it<double> complex:654 > > std::norm<double> complex:664 > > main ine.cpp:14 > > __libc_start_main +241 > > _start +4194346 > > > > 0.68% 0x40ead238ed1a2cf4 > > > > - 6.85% inline libm-2.24.so [.] hypot > > > > hypot +40 > > std::__complex_abs complex:589 > > std::abs<double> complex:597 > > std::_Norm_helper<true>::_S_do_it<double> complex:654 > > std::norm<double> complex:664 > > main ine.cpp:14 > > __libc_start_main +241 > > _start +4194346 > > > > ~~~~~~~~~~~~~~ > > > > I hope you see the value of such a report versus what you are proposing. > > > > Thanks -- Milian Wolff | milian.wolff@kdab.com | Software Engineer KDAB (Deutschland) GmbH&Co KG, a KDAB Group company Tel: +49-30-521325470 KDAB - The Qt Experts [-- Attachment #2: smime.p7s --] [-- Type: application/pkcs7-signature, Size: 5903 bytes --] ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Feedback on: [RESEND PATCH v2 0/5] perf report: Show inline stack 2017-01-10 15:03 ` Milian Wolff @ 2017-01-11 1:01 ` Jin, Yao 2017-01-13 8:56 ` Milian Wolff 0 siblings, 1 reply; 9+ messages in thread From: Jin, Yao @ 2017-01-11 1:01 UTC (permalink / raw) To: Milian Wolff; +Cc: linux-perf-users Hi Wolff, See my comments below Thanks Jin Yao On 1/10/2017 11:03 PM, Milian Wolff wrote: > On Tuesday, January 10, 2017 9:53:04 AM CET Jin, Yao wrote: >> Hi Wolff, >> >> I update the patch series with adding inline searching for each >> callchain entry. > Hey Yao, > > great! Is your git branch publicly accessible somewhere, so I can check it > out? Sorry, no public git branch yet, but I will send my patch series to you in another mail thread. >> Now the result tested on my skl machine is: >> >> # Overhead Command Shared Object Symbol >> # ........ ....... ................... .................................. >> # >> 56.76% inline inline [.] main >> >> ---main >> __libc_start_main >> _start > This is still wrong, no? Tons of stuff gets inlined into main and it seems as > if your code is still not capable of displaying what that is. Please refer to > the output I pasted from my patch in the previous mail. I'm explicitly looking > for the cost of using <random> e.g. That might be not an issue. I have checked rip of each callchain entry and use "addr2line -e <binary> -i <rip>" to verify if it's in inline function, same result. There is probably different rip getting sampled or different sortkey used. The previous example is: perf report --no-children --stdio --inline If I use the perf report --children --stdio --inline, the result is # Children Self Command Shared Object Symbol # ........ ........ ....... ................... .................................. # 98.15% 56.76% inline inline [.] main | |--56.76%--_start | __libc_start_main | main | | ---/usr/include/c++/5/bits/random.h:151 (inline) /usr/include/c++/5/bits/random.h:332 (inline) /usr/include/c++/5/bits/random.tcc:3328 (inline) /usr/include/c++/5/bits/random.h:185 (inline) /usr/include/c++/5/bits/random.h:1818 (inline) /usr/include/c++/5/bits/random.h:1809 (inline) /home/jinyao/skl-ws/perf-dev/lck-2867/test/inline.cpp:14 (inline) --41.39%--main __hypot | --40.09%--__hypot_finite Anyway, you can try my latest patch series and check what happens on your machine. >> 40.54% inline libm-2.23.so [.] __hypot_finite >> >> ---__hypot_finite >> >> --40.09%--__hypot >> >> ---/usr/include/c++/5/bits/random.h:151 (inline) >> /usr/include/c++/5/bits/random.h:332 (inline) >> /usr/include/c++/5/bits/random.tcc:3328 (inline) >> /usr/include/c++/5/bits/random.h:185 (inline) >> /usr/include/c++/5/bits/random.h:1818 (inline) >> /usr/include/c++/5/bits/random.h:1809 (inline) >> /home/jinyao/skl-ws/perf-dev/lck-2867/test/inline.cpp:14 (inline) >> main >> __libc_start_main >> _start > Much better, great work! But again I have to ask the same question: > > Why not integrate directly with the other call stack nodes? This would solve > the following problems: > > - correct display of children cost for inlined frames, i.e. in a `perf report > --children --inline`, I suspect your branch won't actually show any top-level > entries for inline.cpp:14, correct? > > - display/aggregation: currently you are always displaying srcline in your > approach, what if I want to display/aggregate by function symbol instead? I just follow the behavior of addr2line. If I use the command line "addr2line -e <binary> -i <rip>", it just display the srcline on my machine. > So, could we please discuss this please? > > Thanks a lot > >> On 1/9/2017 4:41 PM, Milian Wolff wrote: >>> On Monday, January 9, 2017 4:17:48 PM CET Jin, Yao wrote: >>>> Hi Wolff, >>>> >>>> Thanks a lot for your test code! >>>> >>>> I have updated my patch. The new output for your test code (binary name >>>> is inline) is: >>>> >>>> # Overhead Command Shared Object Symbol >>>> # ........ ....... ................... >>>> .................................. >>>> # >>>> >>>> 56.76% inline inline [.] main >>>> >>>> ---main >>>> >>>> __libc_start_main >>>> _start >>>> >>>> 40.54% inline libm-2.23.so [.] __hypot_finite >>>> >>>> ---__hypot_finite >>>> >>>> --40.09%--__hypot >>>> >>>> main >>>> __libc_start_main >>>> _start >>>> >>>> 1.30% inline libm-2.23.so [.] __hypot >>>> >>>> ---__hypot >>>> >>>> main >>>> >>>> ---/usr/include/c++/5/bits/random.h:151 (inline) >>>> >>>> /usr/include/c++/5/bits/random.h:332 (inline) >>>> /usr/include/c++/5/bits/random.tcc:3328 (inline) >>>> /usr/include/c++/5/bits/random.h:185 (inline) >>>> /usr/include/c++/5/bits/random.h:1818 (inline) >>>> /usr/include/c++/5/bits/random.h:1809 (inline) >>>> >>>> /home/jinyao/skl-ws/perf-dev/lck-2867/test/inline.cpp:14 (inline) >>>> >>>> __libc_start_main >>>> _start >>>> >>>> The inline entry is tagged with "(inline)". Is this result within >>>> expectation? >>> No, this is not the expected output. I expect to get information about the >>> inlined frames in the middle of the call stack, in this example that >>> happens for the first two major hotspots. >>> >>> Below is the output from my approach to this problem using my -g srcline: >>> >>> ~~~~~~~~~~~~~~ pasted here for readability: >>> https://paste.kde.org/pmq3bc1le >>> Samples: 149 of event 'cycles:ppp', Event count (approx.): 141334499 >>> >>> Overhead Command Shared Object Symbol >>> >>> - 62.24% inline inline [.] main >>> >>> - 27.98% std::generate_canonical<double, 53ul, >>> >>> std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >>>> random.tcc:3332 >>> std::__detail::_Adaptor<std::linear_congruential_engine<unsigned >>> long, >>> >>> 16807ul, 0ul, 2147483647ul>, double>::operator() random.h:185 >>> >>> std::uniform_real_distribution<double>::operator()<std::linear_co >>> ngruential_engine<unsigned> >>> long, 16807ul, 0ul, 2147483647ul> > random.h:1818 >>> >>> std::uniform_real_distribution<double>::operator()<std::linear_co >>> ngruential_engine<unsigned> >>> long, 16807ul, 0ul, 2147483647ul> > random.h:1809 >>> >>> main ine.cpp:14 >>> __libc_start_main +241 >>> _start +4194346 >>> >>> - 25.88% std::__detail::_Mod<unsigned long, 2147483647ul, 16807ul, >>> 0ul, >>> >>> true, true>::__calc random.h:143 >>> >>> std::__detail::__mod<unsigned long, 2147483647ul, 16807ul, 0ul> >>> >>> random.h:151 >>> >>> std::linear_congruential_engine<unsigned long, 16807ul, 0ul, >>> >>> 2147483647ul>::operator() random.h:332 >>> >>> std::generate_canonical<double, 53ul, >>> >>> std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >>>> random.tcc:3332 >>> std::__detail::_Adaptor<std::linear_congruential_engine<unsigned >>> long, >>> >>> 16807ul, 0ul, 2147483647ul>, double>::operator() random.h:185 >>> >>> std::uniform_real_distribution<double>::operator()<std::linear_co >>> ngruential_engine<unsigned> >>> long, 16807ul, 0ul, 2147483647ul> > random.h:1818 >>> >>> std::uniform_real_distribution<double>::operator()<std::linear_co >>> ngruential_engine<unsigned> >>> long, 16807ul, 0ul, 2147483647ul> > random.h:1809 >>> >>> main ine.cpp:14 >>> __libc_start_main +241 >>> _start +4194346 >>> >>> + 4.14% main inline.cpp:13 >>> + 2.89% >>> >>> std::uniform_real_distribution<double>::operator()<std::linear_congruentia >>> l_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > random.h:1818 >>> >>> + 0.68% std::_Norm_helper<true>::_S_do_it<double> complex:655 >>> + 0.67% std::generate_canonical<double, 53ul, >>> >>> std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >>>> random.tcc:3326 >>> - 26.24% inline libm-2.24.so [.] __hypot_finite >>> >>> - __hypot_finite +179 >>> >>> - 25.56% hypot +20 >>> >>> std::__complex_abs complex:589 >>> std::abs<double> complex:597 >>> std::_Norm_helper<true>::_S_do_it<double> complex:654 >>> std::norm<double> complex:664 >>> main ine.cpp:14 >>> __libc_start_main +241 >>> _start +4194346 >>> >>> 0.68% 0x40ead238ed1a2cf4 >>> >>> - 6.85% inline libm-2.24.so [.] hypot >>> >>> hypot +40 >>> std::__complex_abs complex:589 >>> std::abs<double> complex:597 >>> std::_Norm_helper<true>::_S_do_it<double> complex:654 >>> std::norm<double> complex:664 >>> main ine.cpp:14 >>> __libc_start_main +241 >>> _start +4194346 >>> >>> ~~~~~~~~~~~~~~ >>> >>> I hope you see the value of such a report versus what you are proposing. >>> >>> Thanks > ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Feedback on: [RESEND PATCH v2 0/5] perf report: Show inline stack 2017-01-11 1:01 ` Jin, Yao @ 2017-01-13 8:56 ` Milian Wolff 0 siblings, 0 replies; 9+ messages in thread From: Milian Wolff @ 2017-01-13 8:56 UTC (permalink / raw) To: Jin, Yao; +Cc: linux-perf-users [-- Attachment #1: Type: text/plain, Size: 11686 bytes --] On Wednesday, January 11, 2017 9:01:44 AM CET Jin, Yao wrote: > Hi Wolff, > > See my comments below > > Thanks Hey Yao, much appreciated! > On 1/10/2017 11:03 PM, Milian Wolff wrote: > > On Tuesday, January 10, 2017 9:53:04 AM CET Jin, Yao wrote: > >> Hi Wolff, > >> > >> I update the patch series with adding inline searching for each > >> callchain entry. > > > > Hey Yao, > > > > great! Is your git branch publicly accessible somewhere, so I can check it > > out? > > Sorry, no public git branch yet, but I will send my patch series to you > in another mail thread. Perfect, I'll apply and test it later today, I think. > >> Now the result tested on my skl machine is: > >> > >> # Overhead Command Shared Object Symbol > >> # ........ ....... ................... > >> .................................. > >> # > >> > >> 56.76% inline inline [.] main > >> > >> ---main > >> > >> __libc_start_main > >> _start > > > > This is still wrong, no? Tons of stuff gets inlined into main and it seems > > as if your code is still not capable of displaying what that is. Please > > refer to the output I pasted from my patch in the previous mail. I'm > > explicitly looking for the cost of using <random> e.g. > > That might be not an issue. I have checked rip of each callchain entry > and use "addr2line -e <binary> -i <rip>" to verify if it's in inline > function, same result. Ah, this may actually be the following issue: https://lists.fedorahosted.org/archives/list/elfutils-devel@lists.fedorahosted.org/thread/X5XUOW4DQ6DNE5SL4QPG3MGGQPJUVSCW/ #G2L4JBSBRDZDZRPL6MF46SSPER5UME2G I'll test and see if addr - 1 produces better inlining information, I have applied this workaround in my branch. But, of course, we'll need to fix this properly, based on the approach that elfutils is taking. > There is probably different rip getting sampled or different sortkey > used. The previous example is: > perf report --no-children --stdio --inline > > If I use the perf report --children --stdio --inline, the result is > > # Children Self Command Shared Object Symbol > # ........ ........ ....... ................... > .................................. > # > 98.15% 56.76% inline inline [.] main > > |--56.76%--_start > | > | __libc_start_main > | main > > ---/usr/include/c++/5/bits/random.h:151 (inline) > /usr/include/c++/5/bits/random.h:332 (inline) > /usr/include/c++/5/bits/random.tcc:3328 (inline) > /usr/include/c++/5/bits/random.h:185 (inline) > /usr/include/c++/5/bits/random.h:1818 (inline) > /usr/include/c++/5/bits/random.h:1809 (inline) > /home/jinyao/skl-ws/perf-dev/lck-2867/test/inline.cpp:14 (inline) > --41.39%--main > __hypot > > --40.09%--__hypot_finite > > Anyway, you can try my latest patch series and check what happens on > your machine. I will, thanks a lot again! Much appreciated. Cheers > >> 40.54% inline libm-2.23.so [.] __hypot_finite > >> > >> ---__hypot_finite > >> > >> --40.09%--__hypot > >> > >> ---/usr/include/c++/5/bits/random.h:151 > >> (inline) > >> > >> /usr/include/c++/5/bits/random.h:332 > >> (inline) > >> > >> /usr/include/c++/5/bits/random.tcc:3328 (inline) > >> > >> /usr/include/c++/5/bits/random.h:185 > >> (inline) > >> /usr/include/c++/5/bits/random.h:1818 > >> (inline) > >> > >> /usr/include/c++/5/bits/random.h:1809 (inline) > >> /home/jinyao/skl-ws/perf-dev/lck-2867/test/inline.cpp:14 (inline) > >> > >> main > >> __libc_start_main > >> _start > > > > Much better, great work! But again I have to ask the same question: > > > > Why not integrate directly with the other call stack nodes? This would > > solve the following problems: > > > > - correct display of children cost for inlined frames, i.e. in a `perf > > report --children --inline`, I suspect your branch won't actually show > > any top-level entries for inline.cpp:14, correct? > > > > - display/aggregation: currently you are always displaying srcline in your > > approach, what if I want to display/aggregate by function symbol instead? > > I just follow the behavior of addr2line. If I use the command line > "addr2line -e <binary> -i <rip>", it just display the srcline on my > machine. > > > So, could we please discuss this please? > > > > Thanks a lot > > > >> On 1/9/2017 4:41 PM, Milian Wolff wrote: > >>> On Monday, January 9, 2017 4:17:48 PM CET Jin, Yao wrote: > >>>> Hi Wolff, > >>>> > >>>> Thanks a lot for your test code! > >>>> > >>>> I have updated my patch. The new output for your test code (binary name > >>>> is inline) is: > >>>> > >>>> # Overhead Command Shared Object Symbol > >>>> # ........ ....... ................... > >>>> .................................. > >>>> # > >>>> > >>>> 56.76% inline inline [.] main > >>>> > >>>> ---main > >>>> > >>>> __libc_start_main > >>>> _start > >>>> > >>>> 40.54% inline libm-2.23.so [.] __hypot_finite > >>>> > >>>> ---__hypot_finite > >>>> > >>>> --40.09%--__hypot > >>>> > >>>> main > >>>> __libc_start_main > >>>> _start > >>>> > >>>> 1.30% inline libm-2.23.so [.] __hypot > >>>> > >>>> ---__hypot > >>>> > >>>> main > >>>> > >>>> ---/usr/include/c++/5/bits/random.h:151 (inline) > >>>> > >>>> /usr/include/c++/5/bits/random.h:332 (inline) > >>>> /usr/include/c++/5/bits/random.tcc:3328 (inline) > >>>> /usr/include/c++/5/bits/random.h:185 (inline) > >>>> /usr/include/c++/5/bits/random.h:1818 (inline) > >>>> /usr/include/c++/5/bits/random.h:1809 (inline) > >>>> > >>>> /home/jinyao/skl-ws/perf-dev/lck-2867/test/inline.cpp:14 (inline) > >>>> > >>>> __libc_start_main > >>>> _start > >>>> > >>>> The inline entry is tagged with "(inline)". Is this result within > >>>> expectation? > >>> > >>> No, this is not the expected output. I expect to get information about > >>> the > >>> inlined frames in the middle of the call stack, in this example that > >>> happens for the first two major hotspots. > >>> > >>> Below is the output from my approach to this problem using my -g > >>> srcline: > >>> > >>> ~~~~~~~~~~~~~~ pasted here for readability: > >>> https://paste.kde.org/pmq3bc1le > >>> Samples: 149 of event 'cycles:ppp', Event count (approx.): 141334499 > >>> > >>> Overhead Command Shared Object Symbol > >>> > >>> - 62.24% inline inline [.] main > >>> > >>> - 27.98% std::generate_canonical<double, 53ul, > >>> > >>> std::linear_congruential_engine<unsigned long, 16807ul, 0ul, > >>> 2147483647ul> > >>> > >>>> random.tcc:3332 > >>>> > >>> std::__detail::_Adaptor<std::linear_congruential_engine<unsign > >>> ed > >>> long, > >>> > >>> 16807ul, 0ul, 2147483647ul>, double>::operator() random.h:185 > >>> > >>> std::uniform_real_distribution<double>::operator()<std::linear > >>> _co > >>> ngruential_engine<unsigned> > >>> > >>> long, 16807ul, 0ul, 2147483647ul> > random.h:1818 > >>> > >>> std::uniform_real_distribution<double>::operator()<std::linear > >>> _co > >>> ngruential_engine<unsigned> > >>> > >>> long, 16807ul, 0ul, 2147483647ul> > random.h:1809 > >>> > >>> main ine.cpp:14 > >>> __libc_start_main +241 > >>> _start +4194346 > >>> > >>> - 25.88% std::__detail::_Mod<unsigned long, 2147483647ul, 16807ul, > >>> 0ul, > >>> > >>> true, true>::__calc random.h:143 > >>> > >>> std::__detail::__mod<unsigned long, 2147483647ul, 16807ul, > >>> 0ul> > >>> > >>> random.h:151 > >>> > >>> std::linear_congruential_engine<unsigned long, 16807ul, 0ul, > >>> > >>> 2147483647ul>::operator() random.h:332 > >>> > >>> std::generate_canonical<double, 53ul, > >>> > >>> std::linear_congruential_engine<unsigned long, 16807ul, 0ul, > >>> 2147483647ul> > >>> > >>>> random.tcc:3332 > >>>> > >>> std::__detail::_Adaptor<std::linear_congruential_engine<unsign > >>> ed > >>> long, > >>> > >>> 16807ul, 0ul, 2147483647ul>, double>::operator() random.h:185 > >>> > >>> std::uniform_real_distribution<double>::operator()<std::linear > >>> _co > >>> ngruential_engine<unsigned> > >>> > >>> long, 16807ul, 0ul, 2147483647ul> > random.h:1818 > >>> > >>> std::uniform_real_distribution<double>::operator()<std::linear > >>> _co > >>> ngruential_engine<unsigned> > >>> > >>> long, 16807ul, 0ul, 2147483647ul> > random.h:1809 > >>> > >>> main ine.cpp:14 > >>> __libc_start_main +241 > >>> _start +4194346 > >>> > >>> + 4.14% main inline.cpp:13 > >>> + 2.89% > >>> > >>> std::uniform_real_distribution<double>::operator()<std::linear_congruent > >>> ia > >>> l_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > random.h:1818 > >>> > >>> + 0.68% std::_Norm_helper<true>::_S_do_it<double> complex:655 > >>> + 0.67% std::generate_canonical<double, 53ul, > >>> > >>> std::linear_congruential_engine<unsigned long, 16807ul, 0ul, > >>> 2147483647ul> > >>> > >>>> random.tcc:3326 > >>> > >>> - 26.24% inline libm-2.24.so [.] __hypot_finite > >>> > >>> - __hypot_finite +179 > >>> > >>> - 25.56% hypot +20 > >>> > >>> std::__complex_abs complex:589 > >>> std::abs<double> complex:597 > >>> std::_Norm_helper<true>::_S_do_it<double> complex:654 > >>> std::norm<double> complex:664 > >>> main ine.cpp:14 > >>> __libc_start_main +241 > >>> _start +4194346 > >>> > >>> 0.68% 0x40ead238ed1a2cf4 > >>> > >>> - 6.85% inline libm-2.24.so [.] hypot > >>> > >>> hypot +40 > >>> std::__complex_abs complex:589 > >>> std::abs<double> complex:597 > >>> std::_Norm_helper<true>::_S_do_it<double> complex:654 > >>> std::norm<double> complex:664 > >>> main ine.cpp:14 > >>> __libc_start_main +241 > >>> _start +4194346 > >>> > >>> ~~~~~~~~~~~~~~ > >>> > >>> I hope you see the value of such a report versus what you are proposing. > >>> > >>> Thanks -- Milian Wolff | milian.wolff@kdab.com | Software Engineer KDAB (Deutschland) GmbH&Co KG, a KDAB Group company Tel: +49-30-521325470 KDAB - The Qt Experts [-- Attachment #2: smime.p7s --] [-- Type: application/pkcs7-signature, Size: 5903 bytes --] ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2017-01-13 8:56 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2017-01-07 20:13 Feedback on: [RESEND PATCH v2 0/5] perf report: Show inline stack Milian Wolff 2017-01-09 3:07 ` Jin, Yao 2017-01-09 8:02 ` Milian Wolff 2017-01-09 8:17 ` Jin, Yao 2017-01-09 8:41 ` Milian Wolff 2017-01-10 1:53 ` Jin, Yao 2017-01-10 15:03 ` Milian Wolff 2017-01-11 1:01 ` Jin, Yao 2017-01-13 8:56 ` Milian Wolff
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).