Perf report --percentage

linux-perf-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Perf report --percentage
@ 2016-06-07 19:21 Sebastian, Shiny
  2016-06-08  7:34 ` Milian Wolff
  0 siblings, 1 reply; 9+ messages in thread
From: Sebastian, Shiny @ 2016-06-07 19:21 UTC (permalink / raw)
  To: linux-perf-users@vger.kernel.org


Hi,
 
I work with Intel and am running perf tool within Linux VMs on Windows Hyper-V.  With kernel 4.6 and corresponding perf version, percentages of call stacks at deeper levels  do not seem to add up to 100% or parent value (26.97%): example below. I have tried the -percentage flag with both 'absolute' and 'relative' values and they don't seem to change the output at all. Is this a bug or something wrong with my perf flags ?

Commands used:
#perf record  -a -g -c 10000 sleep 10
#perf report -i cpu3.data.old -C 1 --no-children

-   27.48%  fio          [kernel.kallsyms]  [k] __softirqentry_text_start
     __softirqentry_text_start
   - irq_exit
      - 26.97% smp_call_function_single_interrupt
         - call_function_single_interrupt
            + 4.21% __blk_run_queue
            + 4.09% hv_ringbuffer_write
            + 1.83% 0x99c
            + 1.78% __blockdev_direct_IO
            + 0.96% do_blockdev_direct_IO
            + 0.95% generic_make_request
            + 0.86% blk_queue_bio
            + 0.70% iov_iter_get_pages
            + 0.64% md_make_request
            + 0.55% do_io_submit
              0.51% entry_SYSCALL_64_fastpath
      + 0.51% hyperv_vector_handler


Older version: Perf version 4.2 on kernel 4.2 - seems to normalize them to 100%. 
    27.69%  swapper          [kernel.kallsyms]            [k] intel_idle
                    |
                    ---intel_idle
                       |
                       |--99.84%-- cpuidle_enter_state
                       |          cpuidle_enter
                       |          call_cpuidle
                       |          cpu_startup_entry
                       |          |
                       |          |--97.27%-- start_secondary
                       |          |
                       |           --2.73%-- rest_init
                       |                     start_kernel
                       |                     x86_64_start_reservations
                       |                     x86_64_start_kernel
                        --0.16%-- [...]


I am running SLES 12. 

Thanks,
Shiny

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Perf report --percentage
  2016-06-07 19:21 Perf report --percentage Sebastian, Shiny
@ 2016-06-08  7:34 ` Milian Wolff
  2016-06-08 16:21   ` Sebastian, Shiny
  2016-06-09 15:13   ` Arnaldo Carvalho de Melo
  0 siblings, 2 replies; 9+ messages in thread
From: Milian Wolff @ 2016-06-08  7:34 UTC (permalink / raw)
  To: shiny.sebastian; +Cc: linux-perf-users

[-- Attachment #1: Type: text/plain, Size: 4075 bytes --]

On Dienstag, 7. Juni 2016 19:21:15 CEST Sebastian, Shiny wrote:
> Hi,
>  
> I work with Intel and am running perf tool within Linux VMs on Windows
> Hyper-V.  With kernel 4.6 and corresponding perf version, percentages of
> call stacks at deeper levels  do not seem to add up to 100% or parent value
> (26.97%): example below. I have tried the -percentage flag with both
> 'absolute' and 'relative' values and they don't seem to change the output
> at all. Is this a bug or something wrong with my perf flags ?
> 
> Commands used:
> #perf record  -a -g -c 10000 sleep 10
> #perf report -i cpu3.data.old -C 1 --no-children
> 
> -   27.48%  fio          [kernel.kallsyms]  [k] __softirqentry_text_start
>      __softirqentry_text_start
>    - irq_exit
>       - 26.97% smp_call_function_single_interrupt
>          - call_function_single_interrupt
>             + 4.21% __blk_run_queue
>             + 4.09% hv_ringbuffer_write
>             + 1.83% 0x99c
>             + 1.78% __blockdev_direct_IO
>             + 0.96% do_blockdev_direct_IO
>             + 0.95% generic_make_request
>             + 0.86% blk_queue_bio
>             + 0.70% iov_iter_get_pages
>             + 0.64% md_make_request
>             + 0.55% do_io_submit
>               0.51% entry_SYSCALL_64_fastpath
>       + 0.51% hyperv_vector_handler

Can you try to look at the output of

perf report -i cpu3.data.old -C 1 --no-children -g graph,0

i.e. potentially there are more entries with a cost of 0.5% or less, which by 
default falls below the threshold.

That said, I just tested it and it doesn't work for me either:

tmp$ perf record --call-graph dwarf kwrite
[ perf record: Woken up 65 times to write data ]
[ perf record: Captured and wrote 16.308 MB perf.data (2032 samples) ]
tmp$ perf report -g graph,0 --no-children

-    9.47%  kwrite           ld-2.23.so                     [.] do_lookup_x
   - 2.21% do_lookup_x
      - _dl_lookup_symbol_x
         + 1.42% _dl_relocate_object
         + 0.74% _dl_fixup
+    7.06%  kwrite           ld-2.23.so                     [.] strcmp 

It's also bogus when I look at it with --children:

-   10.76%     9.47%  kwrite           ld-2.23.so           [.] do_lookup_x
   + 1.54% _start
   + 1.34% do_lookup_x
   + 0.61% _dl_runtime_resolve_avx
-    7.06%     7.06%  kwrite           ld-2.23.so           [.] strcmp
   + 0.42% _start
   + 0.22% _dl_runtime_resolve_avx

So someone else with more insight should answer whether this output can be 
explained somehow.

> Older version: Perf version 4.2 on kernel 4.2 - seems to normalize them to
> 100%.  27.69%  swapper          [kernel.kallsyms]            [k] intel_idle
> 
>                     ---intel_idle
> 
>                        |--99.84%-- cpuidle_enter_state
>                        |
>                        |          cpuidle_enter
>                        |          call_cpuidle
>                        |          cpu_startup_entry
>                        |          
>                        |          |--97.27%-- start_secondary
>                        |          |
>                        |           --2.73%-- rest_init
>                        |           
>                        |                     start_kernel
>                        |                     x86_64_start_reservations
>                        |                     x86_64_start_kernel
> 
>                         --0.16%-- [...]

This output can be reproduced on newer perf by passing `-g fractal` to `perf 
report`. I still get the issues as shown above though with a newer perf:

-    9.47%  kwrite           ld-2.23.so                     [.] do_lookup_x
   + 23.31% do_lookup_x
-    7.06%  kwrite           ld-2.23.so                     [.] strcmp
   + 9.06% strcmp
+    5.71%  kwrite           ld-2.23.so                     [.] check_match

I.e. ~74% and 91% of the samples don't seen to have a proper backtrace 
attribution?

Cheers

-- 
Milian Wolff | milian.wolff@kdab.com | Software Engineer
KDAB (Deutschland) GmbH&Co KG, a KDAB Group company
Tel: +49-30-521325470
KDAB - The Qt Experts

[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5903 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: Perf report --percentage
  2016-06-08  7:34 ` Milian Wolff
@ 2016-06-08 16:21   ` Sebastian, Shiny
  2016-06-11  8:31     ` Namhyung Kim
  2016-06-09 15:13   ` Arnaldo Carvalho de Melo
  1 sibling, 1 reply; 9+ messages in thread
From: Sebastian, Shiny @ 2016-06-08 16:21 UTC (permalink / raw)
  To: Milian Wolff; +Cc: linux-perf-users@vger.kernel.org

Thanks Milian. I added -g graph,0 and it did add lot more percentages - it's still not adding up though.

#perf report -C 1 --no-children -g graph,0
output: 
-   19.84%  fio          [kernel.kallsyms]  [k] _raw_spin_unlock_irqrestore
   - _raw_spin_unlock_irqrestore
      + 7.88% hv_ringbuffer_write
      + 4.88% scsi_end_request
      + 3.43% mix_pool_bytes
      + 0.92% aio_complete
      + 0.85% blk_run_queue
      + 0.80% scsi_put_command
      + 0.37% dio_bio_end_aio
      + 0.35% do_blockdev_direct_IO
      + 0.07% dio_complete
      + 0.07% scsi_run_queue
      + 0.06% __blockdev_direct_IO
      + 0.05% scsi_io_completion
      + 0.04% bio_endio
      + 0.02% add_timer_randomness
      + 0.02% vmbus_sendpacket_mpb_desc
      + 0.01% hrtimer_start_range_ns


Thanks,
Shiny


-----Original Message-----
From: milian [mailto:milian.wolff@kdab.com] 
Sent: Wednesday, June 8, 2016 12:35 AM
To: Sebastian, Shiny <shiny.sebastian@intel.com>
Cc: linux-perf-users@vger.kernel.org
Subject: Re: Perf report --percentage

On Dienstag, 7. Juni 2016 19:21:15 CEST Sebastian, Shiny wrote:
> Hi,
>  
> I work with Intel and am running perf tool within Linux VMs on Windows
> Hyper-V.  With kernel 4.6 and corresponding perf version, percentages of
> call stacks at deeper levels  do not seem to add up to 100% or parent value
> (26.97%): example below. I have tried the -percentage flag with both
> 'absolute' and 'relative' values and they don't seem to change the output
> at all. Is this a bug or something wrong with my perf flags ?
> 
> Commands used:
> #perf record  -a -g -c 10000 sleep 10
> #perf report -i cpu3.data.old -C 1 --no-children
> 
> -   27.48%  fio          [kernel.kallsyms]  [k] __softirqentry_text_start
>      __softirqentry_text_start
>    - irq_exit
>       - 26.97% smp_call_function_single_interrupt
>          - call_function_single_interrupt
>             + 4.21% __blk_run_queue
>             + 4.09% hv_ringbuffer_write
>             + 1.83% 0x99c
>             + 1.78% __blockdev_direct_IO
>             + 0.96% do_blockdev_direct_IO
>             + 0.95% generic_make_request
>             + 0.86% blk_queue_bio
>             + 0.70% iov_iter_get_pages
>             + 0.64% md_make_request
>             + 0.55% do_io_submit
>               0.51% entry_SYSCALL_64_fastpath
>       + 0.51% hyperv_vector_handler

Can you try to look at the output of

perf report -i cpu3.data.old -C 1 --no-children -g graph,0

i.e. potentially there are more entries with a cost of 0.5% or less, which by 
default falls below the threshold.

That said, I just tested it and it doesn't work for me either:

tmp$ perf record --call-graph dwarf kwrite
[ perf record: Woken up 65 times to write data ]
[ perf record: Captured and wrote 16.308 MB perf.data (2032 samples) ]
tmp$ perf report -g graph,0 --no-children

-    9.47%  kwrite           ld-2.23.so                     [.] do_lookup_x
   - 2.21% do_lookup_x
      - _dl_lookup_symbol_x
         + 1.42% _dl_relocate_object
         + 0.74% _dl_fixup
+    7.06%  kwrite           ld-2.23.so                     [.] strcmp 

It's also bogus when I look at it with --children:

-   10.76%     9.47%  kwrite           ld-2.23.so           [.] do_lookup_x
   + 1.54% _start
   + 1.34% do_lookup_x
   + 0.61% _dl_runtime_resolve_avx
-    7.06%     7.06%  kwrite           ld-2.23.so           [.] strcmp
   + 0.42% _start
   + 0.22% _dl_runtime_resolve_avx

So someone else with more insight should answer whether this output can be 
explained somehow.

> Older version: Perf version 4.2 on kernel 4.2 - seems to normalize them to
> 100%.  27.69%  swapper          [kernel.kallsyms]            [k] intel_idle
> 
>                     ---intel_idle
> 
>                        |--99.84%-- cpuidle_enter_state
>                        |
>                        |          cpuidle_enter
>                        |          call_cpuidle
>                        |          cpu_startup_entry
>                        |          
>                        |          |--97.27%-- start_secondary
>                        |          |
>                        |           --2.73%-- rest_init
>                        |           
>                        |                     start_kernel
>                        |                     x86_64_start_reservations
>                        |                     x86_64_start_kernel
> 
>                         --0.16%-- [...]

This output can be reproduced on newer perf by passing `-g fractal` to `perf 
report`. I still get the issues as shown above though with a newer perf:

-    9.47%  kwrite           ld-2.23.so                     [.] do_lookup_x
   + 23.31% do_lookup_x
-    7.06%  kwrite           ld-2.23.so                     [.] strcmp
   + 9.06% strcmp
+    5.71%  kwrite           ld-2.23.so                     [.] check_match

I.e. ~74% and 91% of the samples don't seen to have a proper backtrace 
attribution?

Cheers

-- 
Milian Wolff | milian.wolff@kdab.com | Software Engineer
KDAB (Deutschland) GmbH&Co KG, a KDAB Group company
Tel: +49-30-521325470
KDAB - The Qt Experts

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Perf report --percentage
  2016-06-08 16:21   ` Sebastian, Shiny
@ 2016-06-11  8:31     ` Namhyung Kim
  2016-06-11 11:53       ` Milian Wolff
  0 siblings, 1 reply; 9+ messages in thread
From: Namhyung Kim @ 2016-06-11  8:31 UTC (permalink / raw)
  To: Sebastian, Shiny; +Cc: Milian Wolff, linux-perf-users@vger.kernel.org

Hello,

On Wed, Jun 08, 2016 at 04:21:43PM +0000, Sebastian, Shiny wrote:
> Thanks Milian. I added -g graph,0 and it did add lot more percentages - it's still not adding up though.
> 
> #perf report -C 1 --no-children -g graph,0

I think it's because you used -C option to restrict samples only from
cpu 1.  One way to check this is adding 'cpu' to sort keys and compare
sample count and callchain count.

What is the result of this command?

  # perf report -C 1 --no-children -g graph,0,count -n -s +cpu


Thanks,
Namhyung


> output: 
> -   19.84%  fio          [kernel.kallsyms]  [k] _raw_spin_unlock_irqrestore
>    - _raw_spin_unlock_irqrestore
>       + 7.88% hv_ringbuffer_write
>       + 4.88% scsi_end_request
>       + 3.43% mix_pool_bytes
>       + 0.92% aio_complete
>       + 0.85% blk_run_queue
>       + 0.80% scsi_put_command
>       + 0.37% dio_bio_end_aio
>       + 0.35% do_blockdev_direct_IO
>       + 0.07% dio_complete
>       + 0.07% scsi_run_queue
>       + 0.06% __blockdev_direct_IO
>       + 0.05% scsi_io_completion
>       + 0.04% bio_endio
>       + 0.02% add_timer_randomness
>       + 0.02% vmbus_sendpacket_mpb_desc
>       + 0.01% hrtimer_start_range_ns
> 
> 
> Thanks,
> Shiny

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Perf report --percentage
  2016-06-11  8:31     ` Namhyung Kim
@ 2016-06-11 11:53       ` Milian Wolff
  2016-06-12 13:55         ` Namhyung Kim
  0 siblings, 1 reply; 9+ messages in thread
From: Milian Wolff @ 2016-06-11 11:53 UTC (permalink / raw)
  To: namhyung; +Cc: shiny.sebastian, linux-perf-users

[-- Attachment #1: Type: text/plain, Size: 1195 bytes --]

On Samstag, 11. Juni 2016 17:31:48 CEST Namhyung Kim wrote:
> Hello,
> 
> On Wed, Jun 08, 2016 at 04:21:43PM +0000, Sebastian, Shiny wrote:
> > Thanks Milian. I added -g graph,0 and it did add lot more percentages -
> > it's still not adding up though.
> > 
> > #perf report -C 1 --no-children -g graph,0
> 
> I think it's because you used -C option to restrict samples only from
> cpu 1.  One way to check this is adding 'cpu' to sort keys and compare
> sample count and callchain count.
> 
> What is the result of this command?
> 
>   # perf report -C 1 --no-children -g graph,0,count -n -s +cpu

If that is the case, how would you explain my findings? Note how I did not 
pass `-C 1`, and still missed up to 90% of backtraces:

$ perf report --no-children -g fractal,0

-    9.47%  kwrite           ld-2.23.so                     [.] do_lookup_x
   + 23.31% do_lookup_x
-    7.06%  kwrite           ld-2.23.so                     [.] strcmp
   + 9.06% strcmp
+    5.71%  kwrite           ld-2.23.so                     [.] check_match

-- 
Milian Wolff | milian.wolff@kdab.com | Software Engineer
KDAB (Deutschland) GmbH&Co KG, a KDAB Group company
Tel: +49-30-521325470
KDAB - The Qt Experts

[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5903 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Perf report --percentage
  2016-06-11 11:53       ` Milian Wolff
@ 2016-06-12 13:55         ` Namhyung Kim
  2016-06-13 11:01           ` Milian Wolff
  0 siblings, 1 reply; 9+ messages in thread
From: Namhyung Kim @ 2016-06-12 13:55 UTC (permalink / raw)
  To: Milian Wolff; +Cc: shiny.sebastian, linux-perf-users

Hi Milian,

On Sat, Jun 11, 2016 at 01:53:34PM +0200, Milian Wolff wrote:
> On Samstag, 11. Juni 2016 17:31:48 CEST Namhyung Kim wrote:
> > Hello,
> > 
> > On Wed, Jun 08, 2016 at 04:21:43PM +0000, Sebastian, Shiny wrote:
> > > Thanks Milian. I added -g graph,0 and it did add lot more percentages -
> > > it's still not adding up though.
> > > 
> > > #perf report -C 1 --no-children -g graph,0
> > 
> > I think it's because you used -C option to restrict samples only from
> > cpu 1.  One way to check this is adding 'cpu' to sort keys and compare
> > sample count and callchain count.
> > 
> > What is the result of this command?
> > 
> >   # perf report -C 1 --no-children -g graph,0,count -n -s +cpu
> 
> If that is the case, how would you explain my findings? Note how I did not 
> pass `-C 1`, and still missed up to 90% of backtraces:
> 
> $ perf report --no-children -g fractal,0
> 
> -    9.47%  kwrite           ld-2.23.so                     [.] do_lookup_x
>    + 23.31% do_lookup_x
> -    7.06%  kwrite           ld-2.23.so                     [.] strcmp
>    + 9.06% strcmp
> +    5.71%  kwrite           ld-2.23.so                     [.] check_match

Hmm.. I have no idea.  Could you please show me the count of
callchains with following command?

  $ perf report --no-children -g graph,0,count -n


Also it'd be great if you double check the output of 'perf script' to
ensure all the samples have callchains..

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Perf report --percentage
  2016-06-12 13:55         ` Namhyung Kim
@ 2016-06-13 11:01           ` Milian Wolff
  0 siblings, 0 replies; 9+ messages in thread
From: Milian Wolff @ 2016-06-13 11:01 UTC (permalink / raw)
  To: namhyung; +Cc: shiny.sebastian, linux-perf-users

[-- Attachment #1: Type: text/plain, Size: 3397 bytes --]

On Sonntag, 12. Juni 2016 22:55:26 CEST Namhyung Kim wrote:
> Hi Milian,
> 
> On Sat, Jun 11, 2016 at 01:53:34PM +0200, Milian Wolff wrote:
> > On Samstag, 11. Juni 2016 17:31:48 CEST Namhyung Kim wrote:
> > > Hello,
> > > 
> > > On Wed, Jun 08, 2016 at 04:21:43PM +0000, Sebastian, Shiny wrote:
> > > > Thanks Milian. I added -g graph,0 and it did add lot more percentages
> > > > -
> > > > it's still not adding up though.
> > > > 
> > > > #perf report -C 1 --no-children -g graph,0
> > > 
> > > I think it's because you used -C option to restrict samples only from
> > > cpu 1.  One way to check this is adding 'cpu' to sort keys and compare
> > > sample count and callchain count.
> > > 
> > > What is the result of this command?
> > > 
> > >   # perf report -C 1 --no-children -g graph,0,count -n -s +cpu
> > 
> > If that is the case, how would you explain my findings? Note how I did not
> > pass `-C 1`, and still missed up to 90% of backtraces:
> > 
> > $ perf report --no-children -g fractal,0
> > 
> > -    9.47%  kwrite           ld-2.23.so                     [.]
> > do_lookup_x
> > 
> >    + 23.31% do_lookup_x
> > 
> > -    7.06%  kwrite           ld-2.23.so                     [.] strcmp
> > 
> >    + 9.06% strcmp
> > 
> > +    5.71%  kwrite           ld-2.23.so                     [.]
> > check_match
> 
> Hmm.. I have no idea.  Could you please show me the count of
> callchains with following command?
> 
>   $ perf report --no-children -g graph,0,count -n

-   12.43%           252  kwrite           ld-2.23.so                     [.]
do_lookup_x
     51 do_lookup_x
      + _dl_lookup_symbol_x
-   10.46%           223  kwrite           ld-2.23.so                     [.] 
strcmp
   + 37 strcmp
-    8.66%           175  kwrite           ld-2.23.so                     [.] 
check_match
     44 check_match
        do_lookup_x
      + _dl_lookup_symbol_x

Note that I could not reproduce that on a different machine of mine.

> Also it'd be great if you double check the output of 'perf script' to
> ensure all the samples have callchains..

Quite a few samples don't have callchains:

kwrite 14838 31462.912726:     222759 cycles:u: 

kwrite 14838 31462.913158:     207623 cycles:u: 

kwrite 14838 31462.913428:     196656 cycles:u: 

kwrite 14838 31462.913532:     194832 cycles:u: 

kwrite 14838 31462.913714:     229283 cycles:u: 

kwrite 14838 31462.913895:     239871 cycles:u: 

kwrite 14838 31462.914303:     234688 cycles:u: 

kwrite 14838 31462.914384:     223273 cycles:u: 

kwrite 14838 31462.914533:     281279 cycles:u: 

So this quite possibly is the issue in my case.

Could we improve `perf report` to show a "<no stack>" entry or similar? I.e. 
something like this:

-   12.43%           252  kwrite           ld-2.23.so                     [.]
do_lookup_x
     51 do_lookup_x
      + _dl_lookup_symbol_x
    201 <no stack>
-   10.46%           223  kwrite           ld-2.23.so                     [.] 
strcmp
   + 37 strcmp
    186 <no stack>
-    8.66%           175  kwrite           ld-2.23.so                     [.] 
check_match
     44 check_match
        do_lookup_x
      + _dl_lookup_symbol_x
    131 <no stack>

I'll have a look at implementing that. Should be simple enough, I hope.

Bye
-- 
Milian Wolff | milian.wolff@kdab.com | Software Engineer
KDAB (Deutschland) GmbH&Co KG, a KDAB Group company
Tel: +49-30-521325470
KDAB - The Qt Experts

[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5903 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Perf report --percentage
  2016-06-08  7:34 ` Milian Wolff
  2016-06-08 16:21   ` Sebastian, Shiny
@ 2016-06-09 15:13   ` Arnaldo Carvalho de Melo
  2016-06-10  3:14     ` Namhyung Kim
  1 sibling, 1 reply; 9+ messages in thread
From: Arnaldo Carvalho de Melo @ 2016-06-09 15:13 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Milian Wolff, shiny.sebastian, Frédéric Weisbecker,
	David Ahern, Jiri Olsa, linux-perf-users

Em Wed, Jun 08, 2016 at 09:34:58AM +0200, Milian Wolff escreveu:
> On Dienstag, 7. Juni 2016 19:21:15 CEST Sebastian, Shiny wrote:
> > I work with Intel and am running perf tool within Linux VMs on Windows
> > Hyper-V.  With kernel 4.6 and corresponding perf version, percentages of
> > call stacks at deeper levels  do not seem to add up to 100% or parent value
> > (26.97%): example below. I have tried the -percentage flag with both
> > 'absolute' and 'relative' values and they don't seem to change the output
> > at all. Is this a bug or something wrong with my perf flags ?

> > Commands used:
> > #perf record  -a -g -c 10000 sleep 10
> > #perf report -i cpu3.data.old -C 1 --no-children
> > 
> > -   27.48%  fio          [kernel.kallsyms]  [k] __softirqentry_text_start
> >      __softirqentry_text_start
> >    - irq_exit
> >       - 26.97% smp_call_function_single_interrupt
> >          - call_function_single_interrupt
> >             + 4.21% __blk_run_queue
> >             + 4.09% hv_ringbuffer_write
> >             + 1.83% 0x99c
> >             + 1.78% __blockdev_direct_IO
> >             + 0.96% do_blockdev_direct_IO
> >             + 0.95% generic_make_request
> >             + 0.86% blk_queue_bio
> >             + 0.70% iov_iter_get_pages
> >             + 0.64% md_make_request
> >             + 0.55% do_io_submit
> >               0.51% entry_SYSCALL_64_fastpath
> >       + 0.51% hyperv_vector_handler
> 
> Can you try to look at the output of
> 
> perf report -i cpu3.data.old -C 1 --no-children -g graph,0
> 
> i.e. potentially there are more entries with a cost of 0.5% or less, which by 
> default falls below the threshold.
> 
> That said, I just tested it and it doesn't work for me either:

Namyung, can you please take a look at this?

- Arnaldo
 
> tmp$ perf record --call-graph dwarf kwrite
> [ perf record: Woken up 65 times to write data ]
> [ perf record: Captured and wrote 16.308 MB perf.data (2032 samples) ]
> tmp$ perf report -g graph,0 --no-children
> 
> -    9.47%  kwrite           ld-2.23.so                     [.] do_lookup_x
>    - 2.21% do_lookup_x
>       - _dl_lookup_symbol_x
>          + 1.42% _dl_relocate_object
>          + 0.74% _dl_fixup
> +    7.06%  kwrite           ld-2.23.so                     [.] strcmp 
> 
> It's also bogus when I look at it with --children:
> 
> -   10.76%     9.47%  kwrite           ld-2.23.so           [.] do_lookup_x
>    + 1.54% _start
>    + 1.34% do_lookup_x
>    + 0.61% _dl_runtime_resolve_avx
> -    7.06%     7.06%  kwrite           ld-2.23.so           [.] strcmp
>    + 0.42% _start
>    + 0.22% _dl_runtime_resolve_avx
> 
> So someone else with more insight should answer whether this output can be 
> explained somehow.
> 
> > Older version: Perf version 4.2 on kernel 4.2 - seems to normalize them to
> > 100%.  27.69%  swapper          [kernel.kallsyms]            [k] intel_idle
> > 
> >                     ---intel_idle
> > 
> >                        |--99.84%-- cpuidle_enter_state
> >                        |
> >                        |          cpuidle_enter
> >                        |          call_cpuidle
> >                        |          cpu_startup_entry
> >                        |          
> >                        |          |--97.27%-- start_secondary
> >                        |          |
> >                        |           --2.73%-- rest_init
> >                        |           
> >                        |                     start_kernel
> >                        |                     x86_64_start_reservations
> >                        |                     x86_64_start_kernel
> > 
> >                         --0.16%-- [...]
> 
> This output can be reproduced on newer perf by passing `-g fractal` to `perf 
> report`. I still get the issues as shown above though with a newer perf:
> 
> -    9.47%  kwrite           ld-2.23.so                     [.] do_lookup_x
>    + 23.31% do_lookup_x
> -    7.06%  kwrite           ld-2.23.so                     [.] strcmp
>    + 9.06% strcmp
> +    5.71%  kwrite           ld-2.23.so                     [.] check_match
> 
> I.e. ~74% and 91% of the samples don't seen to have a proper backtrace 
> attribution?
> 
> Cheers
> 
> -- 
> Milian Wolff | milian.wolff@kdab.com | Software Engineer
> KDAB (Deutschland) GmbH&Co KG, a KDAB Group company
> Tel: +49-30-521325470
> KDAB - The Qt Experts

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Perf report --percentage
  2016-06-09 15:13   ` Arnaldo Carvalho de Melo
@ 2016-06-10  3:14     ` Namhyung Kim
  0 siblings, 0 replies; 9+ messages in thread
From: Namhyung Kim @ 2016-06-10  3:14 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Milian Wolff, shiny.sebastian, Frédéric Weisbecker,
	David Ahern, Jiri Olsa, linux-perf-users

Hi Arnaldo,

On Fri, Jun 10, 2016 at 12:13 AM, Arnaldo Carvalho de Melo
<acme@kernel.org> wrote:
> Em Wed, Jun 08, 2016 at 09:34:58AM +0200, Milian Wolff escreveu:
>> On Dienstag, 7. Juni 2016 19:21:15 CEST Sebastian, Shiny wrote:
>> > I work with Intel and am running perf tool within Linux VMs on Windows
>> > Hyper-V.  With kernel 4.6 and corresponding perf version, percentages of
>> > call stacks at deeper levels  do not seem to add up to 100% or parent value
>> > (26.97%): example below. I have tried the -percentage flag with both
>> > 'absolute' and 'relative' values and they don't seem to change the output
>> > at all. Is this a bug or something wrong with my perf flags ?
>
>> > Commands used:
>> > #perf record  -a -g -c 10000 sleep 10
>> > #perf report -i cpu3.data.old -C 1 --no-children
>> >
>> > -   27.48%  fio          [kernel.kallsyms]  [k] __softirqentry_text_start
>> >      __softirqentry_text_start
>> >    - irq_exit
>> >       - 26.97% smp_call_function_single_interrupt
>> >          - call_function_single_interrupt
>> >             + 4.21% __blk_run_queue
>> >             + 4.09% hv_ringbuffer_write
>> >             + 1.83% 0x99c
>> >             + 1.78% __blockdev_direct_IO
>> >             + 0.96% do_blockdev_direct_IO
>> >             + 0.95% generic_make_request
>> >             + 0.86% blk_queue_bio
>> >             + 0.70% iov_iter_get_pages
>> >             + 0.64% md_make_request
>> >             + 0.55% do_io_submit
>> >               0.51% entry_SYSCALL_64_fastpath
>> >       + 0.51% hyperv_vector_handler
>>
>> Can you try to look at the output of
>>
>> perf report -i cpu3.data.old -C 1 --no-children -g graph,0
>>
>> i.e. potentially there are more entries with a cost of 0.5% or less, which by
>> default falls below the threshold.
>>
>> That said, I just tested it and it doesn't work for me either:
>
> Namyung, can you please take a look at this?

Will do.

Thanks,
Namhyung


>
>> tmp$ perf record --call-graph dwarf kwrite
>> [ perf record: Woken up 65 times to write data ]
>> [ perf record: Captured and wrote 16.308 MB perf.data (2032 samples) ]
>> tmp$ perf report -g graph,0 --no-children
>>
>> -    9.47%  kwrite           ld-2.23.so                     [.] do_lookup_x
>>    - 2.21% do_lookup_x
>>       - _dl_lookup_symbol_x
>>          + 1.42% _dl_relocate_object
>>          + 0.74% _dl_fixup
>> +    7.06%  kwrite           ld-2.23.so                     [.] strcmp
>>
>> It's also bogus when I look at it with --children:
>>
>> -   10.76%     9.47%  kwrite           ld-2.23.so           [.] do_lookup_x
>>    + 1.54% _start
>>    + 1.34% do_lookup_x
>>    + 0.61% _dl_runtime_resolve_avx
>> -    7.06%     7.06%  kwrite           ld-2.23.so           [.] strcmp
>>    + 0.42% _start
>>    + 0.22% _dl_runtime_resolve_avx
>>
>> So someone else with more insight should answer whether this output can be
>> explained somehow.
>>
>> > Older version: Perf version 4.2 on kernel 4.2 - seems to normalize them to
>> > 100%.  27.69%  swapper          [kernel.kallsyms]            [k] intel_idle
>> >
>> >                     ---intel_idle
>> >
>> >                        |--99.84%-- cpuidle_enter_state
>> >                        |
>> >                        |          cpuidle_enter
>> >                        |          call_cpuidle
>> >                        |          cpu_startup_entry
>> >                        |
>> >                        |          |--97.27%-- start_secondary
>> >                        |          |
>> >                        |           --2.73%-- rest_init
>> >                        |
>> >                        |                     start_kernel
>> >                        |                     x86_64_start_reservations
>> >                        |                     x86_64_start_kernel
>> >
>> >                         --0.16%-- [...]
>>
>> This output can be reproduced on newer perf by passing `-g fractal` to `perf
>> report`. I still get the issues as shown above though with a newer perf:
>>
>> -    9.47%  kwrite           ld-2.23.so                     [.] do_lookup_x
>>    + 23.31% do_lookup_x
>> -    7.06%  kwrite           ld-2.23.so                     [.] strcmp
>>    + 9.06% strcmp
>> +    5.71%  kwrite           ld-2.23.so                     [.] check_match
>>
>> I.e. ~74% and 91% of the samples don't seen to have a proper backtrace
>> attribution?
>>
>> Cheers
>>
>> --
>> Milian Wolff | milian.wolff@kdab.com | Software Engineer
>> KDAB (Deutschland) GmbH&Co KG, a KDAB Group company
>> Tel: +49-30-521325470
>> KDAB - The Qt Experts
>
>

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2016-06-13 11:01 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-06-07 19:21 Perf report --percentage Sebastian, Shiny
2016-06-08  7:34 ` Milian Wolff
2016-06-08 16:21   ` Sebastian, Shiny
2016-06-11  8:31     ` Namhyung Kim
2016-06-11 11:53       ` Milian Wolff
2016-06-12 13:55         ` Namhyung Kim
2016-06-13 11:01           ` Milian Wolff
2016-06-09 15:13   ` Arnaldo Carvalho de Melo
2016-06-10  3:14     ` Namhyung Kim

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).