* Perf report --percentage @ 2016-06-07 19:21 Sebastian, Shiny 2016-06-08 7:34 ` Milian Wolff 0 siblings, 1 reply; 9+ messages in thread From: Sebastian, Shiny @ 2016-06-07 19:21 UTC (permalink / raw) To: linux-perf-users@vger.kernel.org Hi, I work with Intel and am running perf tool within Linux VMs on Windows Hyper-V. With kernel 4.6 and corresponding perf version, percentages of call stacks at deeper levels do not seem to add up to 100% or parent value (26.97%): example below. I have tried the -percentage flag with both 'absolute' and 'relative' values and they don't seem to change the output at all. Is this a bug or something wrong with my perf flags ? Commands used: #perf record -a -g -c 10000 sleep 10 #perf report -i cpu3.data.old -C 1 --no-children - 27.48% fio [kernel.kallsyms] [k] __softirqentry_text_start __softirqentry_text_start - irq_exit - 26.97% smp_call_function_single_interrupt - call_function_single_interrupt + 4.21% __blk_run_queue + 4.09% hv_ringbuffer_write + 1.83% 0x99c + 1.78% __blockdev_direct_IO + 0.96% do_blockdev_direct_IO + 0.95% generic_make_request + 0.86% blk_queue_bio + 0.70% iov_iter_get_pages + 0.64% md_make_request + 0.55% do_io_submit 0.51% entry_SYSCALL_64_fastpath + 0.51% hyperv_vector_handler Older version: Perf version 4.2 on kernel 4.2 - seems to normalize them to 100%. 27.69% swapper [kernel.kallsyms] [k] intel_idle | ---intel_idle | |--99.84%-- cpuidle_enter_state | cpuidle_enter | call_cpuidle | cpu_startup_entry | | | |--97.27%-- start_secondary | | | --2.73%-- rest_init | start_kernel | x86_64_start_reservations | x86_64_start_kernel --0.16%-- [...] I am running SLES 12. Thanks, Shiny ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Perf report --percentage 2016-06-07 19:21 Perf report --percentage Sebastian, Shiny @ 2016-06-08 7:34 ` Milian Wolff 2016-06-08 16:21 ` Sebastian, Shiny 2016-06-09 15:13 ` Arnaldo Carvalho de Melo 0 siblings, 2 replies; 9+ messages in thread From: Milian Wolff @ 2016-06-08 7:34 UTC (permalink / raw) To: shiny.sebastian; +Cc: linux-perf-users [-- Attachment #1: Type: text/plain, Size: 4075 bytes --] On Dienstag, 7. Juni 2016 19:21:15 CEST Sebastian, Shiny wrote: > Hi, > > I work with Intel and am running perf tool within Linux VMs on Windows > Hyper-V. With kernel 4.6 and corresponding perf version, percentages of > call stacks at deeper levels do not seem to add up to 100% or parent value > (26.97%): example below. I have tried the -percentage flag with both > 'absolute' and 'relative' values and they don't seem to change the output > at all. Is this a bug or something wrong with my perf flags ? > > Commands used: > #perf record -a -g -c 10000 sleep 10 > #perf report -i cpu3.data.old -C 1 --no-children > > - 27.48% fio [kernel.kallsyms] [k] __softirqentry_text_start > __softirqentry_text_start > - irq_exit > - 26.97% smp_call_function_single_interrupt > - call_function_single_interrupt > + 4.21% __blk_run_queue > + 4.09% hv_ringbuffer_write > + 1.83% 0x99c > + 1.78% __blockdev_direct_IO > + 0.96% do_blockdev_direct_IO > + 0.95% generic_make_request > + 0.86% blk_queue_bio > + 0.70% iov_iter_get_pages > + 0.64% md_make_request > + 0.55% do_io_submit > 0.51% entry_SYSCALL_64_fastpath > + 0.51% hyperv_vector_handler Can you try to look at the output of perf report -i cpu3.data.old -C 1 --no-children -g graph,0 i.e. potentially there are more entries with a cost of 0.5% or less, which by default falls below the threshold. That said, I just tested it and it doesn't work for me either: tmp$ perf record --call-graph dwarf kwrite [ perf record: Woken up 65 times to write data ] [ perf record: Captured and wrote 16.308 MB perf.data (2032 samples) ] tmp$ perf report -g graph,0 --no-children - 9.47% kwrite ld-2.23.so [.] do_lookup_x - 2.21% do_lookup_x - _dl_lookup_symbol_x + 1.42% _dl_relocate_object + 0.74% _dl_fixup + 7.06% kwrite ld-2.23.so [.] strcmp It's also bogus when I look at it with --children: - 10.76% 9.47% kwrite ld-2.23.so [.] do_lookup_x + 1.54% _start + 1.34% do_lookup_x + 0.61% _dl_runtime_resolve_avx - 7.06% 7.06% kwrite ld-2.23.so [.] strcmp + 0.42% _start + 0.22% _dl_runtime_resolve_avx So someone else with more insight should answer whether this output can be explained somehow. > Older version: Perf version 4.2 on kernel 4.2 - seems to normalize them to > 100%. 27.69% swapper [kernel.kallsyms] [k] intel_idle > > ---intel_idle > > |--99.84%-- cpuidle_enter_state > | > | cpuidle_enter > | call_cpuidle > | cpu_startup_entry > | > | |--97.27%-- start_secondary > | | > | --2.73%-- rest_init > | > | start_kernel > | x86_64_start_reservations > | x86_64_start_kernel > > --0.16%-- [...] This output can be reproduced on newer perf by passing `-g fractal` to `perf report`. I still get the issues as shown above though with a newer perf: - 9.47% kwrite ld-2.23.so [.] do_lookup_x + 23.31% do_lookup_x - 7.06% kwrite ld-2.23.so [.] strcmp + 9.06% strcmp + 5.71% kwrite ld-2.23.so [.] check_match I.e. ~74% and 91% of the samples don't seen to have a proper backtrace attribution? Cheers -- Milian Wolff | milian.wolff@kdab.com | Software Engineer KDAB (Deutschland) GmbH&Co KG, a KDAB Group company Tel: +49-30-521325470 KDAB - The Qt Experts [-- Attachment #2: smime.p7s --] [-- Type: application/pkcs7-signature, Size: 5903 bytes --] ^ permalink raw reply [flat|nested] 9+ messages in thread
* RE: Perf report --percentage 2016-06-08 7:34 ` Milian Wolff @ 2016-06-08 16:21 ` Sebastian, Shiny 2016-06-11 8:31 ` Namhyung Kim 2016-06-09 15:13 ` Arnaldo Carvalho de Melo 1 sibling, 1 reply; 9+ messages in thread From: Sebastian, Shiny @ 2016-06-08 16:21 UTC (permalink / raw) To: Milian Wolff; +Cc: linux-perf-users@vger.kernel.org Thanks Milian. I added -g graph,0 and it did add lot more percentages - it's still not adding up though. #perf report -C 1 --no-children -g graph,0 output: - 19.84% fio [kernel.kallsyms] [k] _raw_spin_unlock_irqrestore - _raw_spin_unlock_irqrestore + 7.88% hv_ringbuffer_write + 4.88% scsi_end_request + 3.43% mix_pool_bytes + 0.92% aio_complete + 0.85% blk_run_queue + 0.80% scsi_put_command + 0.37% dio_bio_end_aio + 0.35% do_blockdev_direct_IO + 0.07% dio_complete + 0.07% scsi_run_queue + 0.06% __blockdev_direct_IO + 0.05% scsi_io_completion + 0.04% bio_endio + 0.02% add_timer_randomness + 0.02% vmbus_sendpacket_mpb_desc + 0.01% hrtimer_start_range_ns Thanks, Shiny -----Original Message----- From: milian [mailto:milian.wolff@kdab.com] Sent: Wednesday, June 8, 2016 12:35 AM To: Sebastian, Shiny <shiny.sebastian@intel.com> Cc: linux-perf-users@vger.kernel.org Subject: Re: Perf report --percentage On Dienstag, 7. Juni 2016 19:21:15 CEST Sebastian, Shiny wrote: > Hi, > > I work with Intel and am running perf tool within Linux VMs on Windows > Hyper-V. With kernel 4.6 and corresponding perf version, percentages of > call stacks at deeper levels do not seem to add up to 100% or parent value > (26.97%): example below. I have tried the -percentage flag with both > 'absolute' and 'relative' values and they don't seem to change the output > at all. Is this a bug or something wrong with my perf flags ? > > Commands used: > #perf record -a -g -c 10000 sleep 10 > #perf report -i cpu3.data.old -C 1 --no-children > > - 27.48% fio [kernel.kallsyms] [k] __softirqentry_text_start > __softirqentry_text_start > - irq_exit > - 26.97% smp_call_function_single_interrupt > - call_function_single_interrupt > + 4.21% __blk_run_queue > + 4.09% hv_ringbuffer_write > + 1.83% 0x99c > + 1.78% __blockdev_direct_IO > + 0.96% do_blockdev_direct_IO > + 0.95% generic_make_request > + 0.86% blk_queue_bio > + 0.70% iov_iter_get_pages > + 0.64% md_make_request > + 0.55% do_io_submit > 0.51% entry_SYSCALL_64_fastpath > + 0.51% hyperv_vector_handler Can you try to look at the output of perf report -i cpu3.data.old -C 1 --no-children -g graph,0 i.e. potentially there are more entries with a cost of 0.5% or less, which by default falls below the threshold. That said, I just tested it and it doesn't work for me either: tmp$ perf record --call-graph dwarf kwrite [ perf record: Woken up 65 times to write data ] [ perf record: Captured and wrote 16.308 MB perf.data (2032 samples) ] tmp$ perf report -g graph,0 --no-children - 9.47% kwrite ld-2.23.so [.] do_lookup_x - 2.21% do_lookup_x - _dl_lookup_symbol_x + 1.42% _dl_relocate_object + 0.74% _dl_fixup + 7.06% kwrite ld-2.23.so [.] strcmp It's also bogus when I look at it with --children: - 10.76% 9.47% kwrite ld-2.23.so [.] do_lookup_x + 1.54% _start + 1.34% do_lookup_x + 0.61% _dl_runtime_resolve_avx - 7.06% 7.06% kwrite ld-2.23.so [.] strcmp + 0.42% _start + 0.22% _dl_runtime_resolve_avx So someone else with more insight should answer whether this output can be explained somehow. > Older version: Perf version 4.2 on kernel 4.2 - seems to normalize them to > 100%. 27.69% swapper [kernel.kallsyms] [k] intel_idle > > ---intel_idle > > |--99.84%-- cpuidle_enter_state > | > | cpuidle_enter > | call_cpuidle > | cpu_startup_entry > | > | |--97.27%-- start_secondary > | | > | --2.73%-- rest_init > | > | start_kernel > | x86_64_start_reservations > | x86_64_start_kernel > > --0.16%-- [...] This output can be reproduced on newer perf by passing `-g fractal` to `perf report`. I still get the issues as shown above though with a newer perf: - 9.47% kwrite ld-2.23.so [.] do_lookup_x + 23.31% do_lookup_x - 7.06% kwrite ld-2.23.so [.] strcmp + 9.06% strcmp + 5.71% kwrite ld-2.23.so [.] check_match I.e. ~74% and 91% of the samples don't seen to have a proper backtrace attribution? Cheers -- Milian Wolff | milian.wolff@kdab.com | Software Engineer KDAB (Deutschland) GmbH&Co KG, a KDAB Group company Tel: +49-30-521325470 KDAB - The Qt Experts ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Perf report --percentage 2016-06-08 16:21 ` Sebastian, Shiny @ 2016-06-11 8:31 ` Namhyung Kim 2016-06-11 11:53 ` Milian Wolff 0 siblings, 1 reply; 9+ messages in thread From: Namhyung Kim @ 2016-06-11 8:31 UTC (permalink / raw) To: Sebastian, Shiny; +Cc: Milian Wolff, linux-perf-users@vger.kernel.org Hello, On Wed, Jun 08, 2016 at 04:21:43PM +0000, Sebastian, Shiny wrote: > Thanks Milian. I added -g graph,0 and it did add lot more percentages - it's still not adding up though. > > #perf report -C 1 --no-children -g graph,0 I think it's because you used -C option to restrict samples only from cpu 1. One way to check this is adding 'cpu' to sort keys and compare sample count and callchain count. What is the result of this command? # perf report -C 1 --no-children -g graph,0,count -n -s +cpu Thanks, Namhyung > output: > - 19.84% fio [kernel.kallsyms] [k] _raw_spin_unlock_irqrestore > - _raw_spin_unlock_irqrestore > + 7.88% hv_ringbuffer_write > + 4.88% scsi_end_request > + 3.43% mix_pool_bytes > + 0.92% aio_complete > + 0.85% blk_run_queue > + 0.80% scsi_put_command > + 0.37% dio_bio_end_aio > + 0.35% do_blockdev_direct_IO > + 0.07% dio_complete > + 0.07% scsi_run_queue > + 0.06% __blockdev_direct_IO > + 0.05% scsi_io_completion > + 0.04% bio_endio > + 0.02% add_timer_randomness > + 0.02% vmbus_sendpacket_mpb_desc > + 0.01% hrtimer_start_range_ns > > > Thanks, > Shiny ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Perf report --percentage 2016-06-11 8:31 ` Namhyung Kim @ 2016-06-11 11:53 ` Milian Wolff 2016-06-12 13:55 ` Namhyung Kim 0 siblings, 1 reply; 9+ messages in thread From: Milian Wolff @ 2016-06-11 11:53 UTC (permalink / raw) To: namhyung; +Cc: shiny.sebastian, linux-perf-users [-- Attachment #1: Type: text/plain, Size: 1195 bytes --] On Samstag, 11. Juni 2016 17:31:48 CEST Namhyung Kim wrote: > Hello, > > On Wed, Jun 08, 2016 at 04:21:43PM +0000, Sebastian, Shiny wrote: > > Thanks Milian. I added -g graph,0 and it did add lot more percentages - > > it's still not adding up though. > > > > #perf report -C 1 --no-children -g graph,0 > > I think it's because you used -C option to restrict samples only from > cpu 1. One way to check this is adding 'cpu' to sort keys and compare > sample count and callchain count. > > What is the result of this command? > > # perf report -C 1 --no-children -g graph,0,count -n -s +cpu If that is the case, how would you explain my findings? Note how I did not pass `-C 1`, and still missed up to 90% of backtraces: $ perf report --no-children -g fractal,0 - 9.47% kwrite ld-2.23.so [.] do_lookup_x + 23.31% do_lookup_x - 7.06% kwrite ld-2.23.so [.] strcmp + 9.06% strcmp + 5.71% kwrite ld-2.23.so [.] check_match -- Milian Wolff | milian.wolff@kdab.com | Software Engineer KDAB (Deutschland) GmbH&Co KG, a KDAB Group company Tel: +49-30-521325470 KDAB - The Qt Experts [-- Attachment #2: smime.p7s --] [-- Type: application/pkcs7-signature, Size: 5903 bytes --] ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Perf report --percentage 2016-06-11 11:53 ` Milian Wolff @ 2016-06-12 13:55 ` Namhyung Kim 2016-06-13 11:01 ` Milian Wolff 0 siblings, 1 reply; 9+ messages in thread From: Namhyung Kim @ 2016-06-12 13:55 UTC (permalink / raw) To: Milian Wolff; +Cc: shiny.sebastian, linux-perf-users Hi Milian, On Sat, Jun 11, 2016 at 01:53:34PM +0200, Milian Wolff wrote: > On Samstag, 11. Juni 2016 17:31:48 CEST Namhyung Kim wrote: > > Hello, > > > > On Wed, Jun 08, 2016 at 04:21:43PM +0000, Sebastian, Shiny wrote: > > > Thanks Milian. I added -g graph,0 and it did add lot more percentages - > > > it's still not adding up though. > > > > > > #perf report -C 1 --no-children -g graph,0 > > > > I think it's because you used -C option to restrict samples only from > > cpu 1. One way to check this is adding 'cpu' to sort keys and compare > > sample count and callchain count. > > > > What is the result of this command? > > > > # perf report -C 1 --no-children -g graph,0,count -n -s +cpu > > If that is the case, how would you explain my findings? Note how I did not > pass `-C 1`, and still missed up to 90% of backtraces: > > $ perf report --no-children -g fractal,0 > > - 9.47% kwrite ld-2.23.so [.] do_lookup_x > + 23.31% do_lookup_x > - 7.06% kwrite ld-2.23.so [.] strcmp > + 9.06% strcmp > + 5.71% kwrite ld-2.23.so [.] check_match Hmm.. I have no idea. Could you please show me the count of callchains with following command? $ perf report --no-children -g graph,0,count -n Also it'd be great if you double check the output of 'perf script' to ensure all the samples have callchains.. Thanks, Namhyung ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Perf report --percentage 2016-06-12 13:55 ` Namhyung Kim @ 2016-06-13 11:01 ` Milian Wolff 0 siblings, 0 replies; 9+ messages in thread From: Milian Wolff @ 2016-06-13 11:01 UTC (permalink / raw) To: namhyung; +Cc: shiny.sebastian, linux-perf-users [-- Attachment #1: Type: text/plain, Size: 3397 bytes --] On Sonntag, 12. Juni 2016 22:55:26 CEST Namhyung Kim wrote: > Hi Milian, > > On Sat, Jun 11, 2016 at 01:53:34PM +0200, Milian Wolff wrote: > > On Samstag, 11. Juni 2016 17:31:48 CEST Namhyung Kim wrote: > > > Hello, > > > > > > On Wed, Jun 08, 2016 at 04:21:43PM +0000, Sebastian, Shiny wrote: > > > > Thanks Milian. I added -g graph,0 and it did add lot more percentages > > > > - > > > > it's still not adding up though. > > > > > > > > #perf report -C 1 --no-children -g graph,0 > > > > > > I think it's because you used -C option to restrict samples only from > > > cpu 1. One way to check this is adding 'cpu' to sort keys and compare > > > sample count and callchain count. > > > > > > What is the result of this command? > > > > > > # perf report -C 1 --no-children -g graph,0,count -n -s +cpu > > > > If that is the case, how would you explain my findings? Note how I did not > > pass `-C 1`, and still missed up to 90% of backtraces: > > > > $ perf report --no-children -g fractal,0 > > > > - 9.47% kwrite ld-2.23.so [.] > > do_lookup_x > > > > + 23.31% do_lookup_x > > > > - 7.06% kwrite ld-2.23.so [.] strcmp > > > > + 9.06% strcmp > > > > + 5.71% kwrite ld-2.23.so [.] > > check_match > > Hmm.. I have no idea. Could you please show me the count of > callchains with following command? > > $ perf report --no-children -g graph,0,count -n - 12.43% 252 kwrite ld-2.23.so [.] do_lookup_x 51 do_lookup_x + _dl_lookup_symbol_x - 10.46% 223 kwrite ld-2.23.so [.] strcmp + 37 strcmp - 8.66% 175 kwrite ld-2.23.so [.] check_match 44 check_match do_lookup_x + _dl_lookup_symbol_x Note that I could not reproduce that on a different machine of mine. > Also it'd be great if you double check the output of 'perf script' to > ensure all the samples have callchains.. Quite a few samples don't have callchains: kwrite 14838 31462.912726: 222759 cycles:u: kwrite 14838 31462.913158: 207623 cycles:u: kwrite 14838 31462.913428: 196656 cycles:u: kwrite 14838 31462.913532: 194832 cycles:u: kwrite 14838 31462.913714: 229283 cycles:u: kwrite 14838 31462.913895: 239871 cycles:u: kwrite 14838 31462.914303: 234688 cycles:u: kwrite 14838 31462.914384: 223273 cycles:u: kwrite 14838 31462.914533: 281279 cycles:u: So this quite possibly is the issue in my case. Could we improve `perf report` to show a "<no stack>" entry or similar? I.e. something like this: - 12.43% 252 kwrite ld-2.23.so [.] do_lookup_x 51 do_lookup_x + _dl_lookup_symbol_x 201 <no stack> - 10.46% 223 kwrite ld-2.23.so [.] strcmp + 37 strcmp 186 <no stack> - 8.66% 175 kwrite ld-2.23.so [.] check_match 44 check_match do_lookup_x + _dl_lookup_symbol_x 131 <no stack> I'll have a look at implementing that. Should be simple enough, I hope. Bye -- Milian Wolff | milian.wolff@kdab.com | Software Engineer KDAB (Deutschland) GmbH&Co KG, a KDAB Group company Tel: +49-30-521325470 KDAB - The Qt Experts [-- Attachment #2: smime.p7s --] [-- Type: application/pkcs7-signature, Size: 5903 bytes --] ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Perf report --percentage 2016-06-08 7:34 ` Milian Wolff 2016-06-08 16:21 ` Sebastian, Shiny @ 2016-06-09 15:13 ` Arnaldo Carvalho de Melo 2016-06-10 3:14 ` Namhyung Kim 1 sibling, 1 reply; 9+ messages in thread From: Arnaldo Carvalho de Melo @ 2016-06-09 15:13 UTC (permalink / raw) To: Namhyung Kim Cc: Milian Wolff, shiny.sebastian, Frédéric Weisbecker, David Ahern, Jiri Olsa, linux-perf-users Em Wed, Jun 08, 2016 at 09:34:58AM +0200, Milian Wolff escreveu: > On Dienstag, 7. Juni 2016 19:21:15 CEST Sebastian, Shiny wrote: > > I work with Intel and am running perf tool within Linux VMs on Windows > > Hyper-V. With kernel 4.6 and corresponding perf version, percentages of > > call stacks at deeper levels do not seem to add up to 100% or parent value > > (26.97%): example below. I have tried the -percentage flag with both > > 'absolute' and 'relative' values and they don't seem to change the output > > at all. Is this a bug or something wrong with my perf flags ? > > Commands used: > > #perf record -a -g -c 10000 sleep 10 > > #perf report -i cpu3.data.old -C 1 --no-children > > > > - 27.48% fio [kernel.kallsyms] [k] __softirqentry_text_start > > __softirqentry_text_start > > - irq_exit > > - 26.97% smp_call_function_single_interrupt > > - call_function_single_interrupt > > + 4.21% __blk_run_queue > > + 4.09% hv_ringbuffer_write > > + 1.83% 0x99c > > + 1.78% __blockdev_direct_IO > > + 0.96% do_blockdev_direct_IO > > + 0.95% generic_make_request > > + 0.86% blk_queue_bio > > + 0.70% iov_iter_get_pages > > + 0.64% md_make_request > > + 0.55% do_io_submit > > 0.51% entry_SYSCALL_64_fastpath > > + 0.51% hyperv_vector_handler > > Can you try to look at the output of > > perf report -i cpu3.data.old -C 1 --no-children -g graph,0 > > i.e. potentially there are more entries with a cost of 0.5% or less, which by > default falls below the threshold. > > That said, I just tested it and it doesn't work for me either: Namyung, can you please take a look at this? - Arnaldo > tmp$ perf record --call-graph dwarf kwrite > [ perf record: Woken up 65 times to write data ] > [ perf record: Captured and wrote 16.308 MB perf.data (2032 samples) ] > tmp$ perf report -g graph,0 --no-children > > - 9.47% kwrite ld-2.23.so [.] do_lookup_x > - 2.21% do_lookup_x > - _dl_lookup_symbol_x > + 1.42% _dl_relocate_object > + 0.74% _dl_fixup > + 7.06% kwrite ld-2.23.so [.] strcmp > > It's also bogus when I look at it with --children: > > - 10.76% 9.47% kwrite ld-2.23.so [.] do_lookup_x > + 1.54% _start > + 1.34% do_lookup_x > + 0.61% _dl_runtime_resolve_avx > - 7.06% 7.06% kwrite ld-2.23.so [.] strcmp > + 0.42% _start > + 0.22% _dl_runtime_resolve_avx > > So someone else with more insight should answer whether this output can be > explained somehow. > > > Older version: Perf version 4.2 on kernel 4.2 - seems to normalize them to > > 100%. 27.69% swapper [kernel.kallsyms] [k] intel_idle > > > > ---intel_idle > > > > |--99.84%-- cpuidle_enter_state > > | > > | cpuidle_enter > > | call_cpuidle > > | cpu_startup_entry > > | > > | |--97.27%-- start_secondary > > | | > > | --2.73%-- rest_init > > | > > | start_kernel > > | x86_64_start_reservations > > | x86_64_start_kernel > > > > --0.16%-- [...] > > This output can be reproduced on newer perf by passing `-g fractal` to `perf > report`. I still get the issues as shown above though with a newer perf: > > - 9.47% kwrite ld-2.23.so [.] do_lookup_x > + 23.31% do_lookup_x > - 7.06% kwrite ld-2.23.so [.] strcmp > + 9.06% strcmp > + 5.71% kwrite ld-2.23.so [.] check_match > > I.e. ~74% and 91% of the samples don't seen to have a proper backtrace > attribution? > > Cheers > > -- > Milian Wolff | milian.wolff@kdab.com | Software Engineer > KDAB (Deutschland) GmbH&Co KG, a KDAB Group company > Tel: +49-30-521325470 > KDAB - The Qt Experts ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Perf report --percentage 2016-06-09 15:13 ` Arnaldo Carvalho de Melo @ 2016-06-10 3:14 ` Namhyung Kim 0 siblings, 0 replies; 9+ messages in thread From: Namhyung Kim @ 2016-06-10 3:14 UTC (permalink / raw) To: Arnaldo Carvalho de Melo Cc: Milian Wolff, shiny.sebastian, Frédéric Weisbecker, David Ahern, Jiri Olsa, linux-perf-users Hi Arnaldo, On Fri, Jun 10, 2016 at 12:13 AM, Arnaldo Carvalho de Melo <acme@kernel.org> wrote: > Em Wed, Jun 08, 2016 at 09:34:58AM +0200, Milian Wolff escreveu: >> On Dienstag, 7. Juni 2016 19:21:15 CEST Sebastian, Shiny wrote: >> > I work with Intel and am running perf tool within Linux VMs on Windows >> > Hyper-V. With kernel 4.6 and corresponding perf version, percentages of >> > call stacks at deeper levels do not seem to add up to 100% or parent value >> > (26.97%): example below. I have tried the -percentage flag with both >> > 'absolute' and 'relative' values and they don't seem to change the output >> > at all. Is this a bug or something wrong with my perf flags ? > >> > Commands used: >> > #perf record -a -g -c 10000 sleep 10 >> > #perf report -i cpu3.data.old -C 1 --no-children >> > >> > - 27.48% fio [kernel.kallsyms] [k] __softirqentry_text_start >> > __softirqentry_text_start >> > - irq_exit >> > - 26.97% smp_call_function_single_interrupt >> > - call_function_single_interrupt >> > + 4.21% __blk_run_queue >> > + 4.09% hv_ringbuffer_write >> > + 1.83% 0x99c >> > + 1.78% __blockdev_direct_IO >> > + 0.96% do_blockdev_direct_IO >> > + 0.95% generic_make_request >> > + 0.86% blk_queue_bio >> > + 0.70% iov_iter_get_pages >> > + 0.64% md_make_request >> > + 0.55% do_io_submit >> > 0.51% entry_SYSCALL_64_fastpath >> > + 0.51% hyperv_vector_handler >> >> Can you try to look at the output of >> >> perf report -i cpu3.data.old -C 1 --no-children -g graph,0 >> >> i.e. potentially there are more entries with a cost of 0.5% or less, which by >> default falls below the threshold. >> >> That said, I just tested it and it doesn't work for me either: > > Namyung, can you please take a look at this? Will do. Thanks, Namhyung > >> tmp$ perf record --call-graph dwarf kwrite >> [ perf record: Woken up 65 times to write data ] >> [ perf record: Captured and wrote 16.308 MB perf.data (2032 samples) ] >> tmp$ perf report -g graph,0 --no-children >> >> - 9.47% kwrite ld-2.23.so [.] do_lookup_x >> - 2.21% do_lookup_x >> - _dl_lookup_symbol_x >> + 1.42% _dl_relocate_object >> + 0.74% _dl_fixup >> + 7.06% kwrite ld-2.23.so [.] strcmp >> >> It's also bogus when I look at it with --children: >> >> - 10.76% 9.47% kwrite ld-2.23.so [.] do_lookup_x >> + 1.54% _start >> + 1.34% do_lookup_x >> + 0.61% _dl_runtime_resolve_avx >> - 7.06% 7.06% kwrite ld-2.23.so [.] strcmp >> + 0.42% _start >> + 0.22% _dl_runtime_resolve_avx >> >> So someone else with more insight should answer whether this output can be >> explained somehow. >> >> > Older version: Perf version 4.2 on kernel 4.2 - seems to normalize them to >> > 100%. 27.69% swapper [kernel.kallsyms] [k] intel_idle >> > >> > ---intel_idle >> > >> > |--99.84%-- cpuidle_enter_state >> > | >> > | cpuidle_enter >> > | call_cpuidle >> > | cpu_startup_entry >> > | >> > | |--97.27%-- start_secondary >> > | | >> > | --2.73%-- rest_init >> > | >> > | start_kernel >> > | x86_64_start_reservations >> > | x86_64_start_kernel >> > >> > --0.16%-- [...] >> >> This output can be reproduced on newer perf by passing `-g fractal` to `perf >> report`. I still get the issues as shown above though with a newer perf: >> >> - 9.47% kwrite ld-2.23.so [.] do_lookup_x >> + 23.31% do_lookup_x >> - 7.06% kwrite ld-2.23.so [.] strcmp >> + 9.06% strcmp >> + 5.71% kwrite ld-2.23.so [.] check_match >> >> I.e. ~74% and 91% of the samples don't seen to have a proper backtrace >> attribution? >> >> Cheers >> >> -- >> Milian Wolff | milian.wolff@kdab.com | Software Engineer >> KDAB (Deutschland) GmbH&Co KG, a KDAB Group company >> Tel: +49-30-521325470 >> KDAB - The Qt Experts > > ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2016-06-13 11:01 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2016-06-07 19:21 Perf report --percentage Sebastian, Shiny 2016-06-08 7:34 ` Milian Wolff 2016-06-08 16:21 ` Sebastian, Shiny 2016-06-11 8:31 ` Namhyung Kim 2016-06-11 11:53 ` Milian Wolff 2016-06-12 13:55 ` Namhyung Kim 2016-06-13 11:01 ` Milian Wolff 2016-06-09 15:13 ` Arnaldo Carvalho de Melo 2016-06-10 3:14 ` Namhyung Kim
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).