perf sched trick to profile wait times: period miscalculated on multicore machines

linux-perf-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Milian Wolff <mail@milianw.de>
To: linux-perf-users@vger.kernel.org
Subject: perf sched trick to profile wait times: period miscalculated on multicore machines
Date: Tue, 04 Aug 2015 17:36:54 +0200	[thread overview]
Message-ID: <2080352.v3MQbmJ2PT@milian-kdab2> (raw)

Hello all,

following the perf tutorial, I recorded and merged samples and backtraces to 
find out where my code is sleeping. The result is something like this:

$ perf report -g graph --no-children --show-total-period --stdio
...

   100.00%            24    8886224240  ex_sleep  
fcca6947df440540ec364a93452b05b3a13ef7  [k] __schedule
            |
            ---__schedule
               schedule
               do_nanosleep
               hrtimer_nanosleep
               sys_nanosleep
               system_call
               __nanosleep
               0x7f36c861f2ad
               |          
               |--90.07%-- QThread::sleep
               |          main
               |          
               |--9.02%-- QThread::msleep
               |          main
               |          
                --0.91%-- QThread::usleep
                          main

Cool! But the period should be in ns, no? That number reported is wrong by an 
order of magnitude, roughly:

The example code sleeps for 1s, 100ms and 10ms. perf stat agrees:

1.112868066 seconds time elapsed ( +-  0.01%, avg over 10 runs )

But perf record shows a period of 8886224240ns = 8.89s - a difference of 
factor 8 (which corresponds to the number of cores on this machine, see also 
below).

I recorded the data by doing

$ sudo mount -o remount,mode=755 /sys/kernel/debug
$ sudo mount -o remount,mode=755 /sys/kernel/debug/tracing
$ echo \"-1\" | sudo tee /proc/sys/kernel/perf_event_paranoid

perf record -e sched:sched_stat_sleep -e sched:sched_switch -e 
sched:sched_process_exit --call-graph dwarf -o perf.data.raw ./ex_sleep
perf inject -v -s -i perf.data.raw -o perf.data
perf report --no-children --show-total-period --stdio

One interesting find of mine is the following:

 perf script -i perf.data.raw | grep delay
           :7038  7038 [004] 80885.657156: sched:sched_stat_sleep: comm=perf 
pid=7039 delay=1015990 [ns]
           :7038  7038 [004] 80885.657158: sched:sched_stat_sleep: comm=perf 
pid=7039 delay=1015990 [ns]
           :7038  7038 [004] 80885.657158: sched:sched_stat_sleep: comm=perf 
pid=7039 delay=1015990 [ns]
           :7038  7038 [004] 80885.657158: sched:sched_stat_sleep: comm=perf 
pid=7039 delay=1015990 [ns]
           :7038  7038 [004] 80885.657158: sched:sched_stat_sleep: comm=perf 
pid=7039 delay=1015990 [ns]
           :7038  7038 [004] 80885.657158: sched:sched_stat_sleep: comm=perf 
pid=7039 delay=1015990 [ns]
           :7038  7038 [004] 80885.657159: sched:sched_stat_sleep: comm=perf 
pid=7039 delay=1015990 [ns]
           :7038  7038 [004] 80885.657159: sched:sched_stat_sleep: comm=perf 
pid=7039 delay=1015990 [ns]
         swapper     0 [002] 80886.659750: sched:sched_stat_sleep: 
comm=ex_sleep pid=7039 delay=1000521253 [ns]
         swapper     0 [002] 80886.659752: sched:sched_stat_sleep: 
comm=ex_sleep pid=7039 delay=1000521253 [ns]
         swapper     0 [002] 80886.659753: sched:sched_stat_sleep: 
comm=ex_sleep pid=7039 delay=1000521253 [ns]
         swapper     0 [002] 80886.659754: sched:sched_stat_sleep: 
comm=ex_sleep pid=7039 delay=1000521253 [ns]
         swapper     0 [002] 80886.659754: sched:sched_stat_sleep: 
comm=ex_sleep pid=7039 delay=1000521253 [ns]
         swapper     0 [002] 80886.659755: sched:sched_stat_sleep: 
comm=ex_sleep pid=7039 delay=1000521253 [ns]
         swapper     0 [002] 80886.659756: sched:sched_stat_sleep: 
comm=ex_sleep pid=7039 delay=1000521253 [ns]
         swapper     0 [002] 80886.659757: sched:sched_stat_sleep: 
comm=ex_sleep pid=7039 delay=1000521253 [ns]
         swapper     0 [002] 80886.759925: sched:sched_stat_sleep: 
comm=ex_sleep pid=7039 delay=100144138 [ns]
         swapper     0 [002] 80886.759927: sched:sched_stat_sleep: 
comm=ex_sleep pid=7039 delay=100144138 [ns]
         swapper     0 [002] 80886.759927: sched:sched_stat_sleep: 
comm=ex_sleep pid=7039 delay=100144138 [ns]
         swapper     0 [002] 80886.759928: sched:sched_stat_sleep: 
comm=ex_sleep pid=7039 delay=100144138 [ns]
         swapper     0 [002] 80886.759929: sched:sched_stat_sleep: 
comm=ex_sleep pid=7039 delay=100144138 [ns]
         swapper     0 [002] 80886.759930: sched:sched_stat_sleep: 
comm=ex_sleep pid=7039 delay=100144138 [ns]
         swapper     0 [002] 80886.759930: sched:sched_stat_sleep: 
comm=ex_sleep pid=7039 delay=100144138 [ns]
         swapper     0 [002] 80886.759931: sched:sched_stat_sleep: 
comm=ex_sleep pid=7039 delay=100144138 [ns]
         swapper     0 [002] 80886.770070: sched:sched_stat_sleep: 
comm=ex_sleep pid=7039 delay=10114111 [ns]
         swapper     0 [002] 80886.770072: sched:sched_stat_sleep: 
comm=ex_sleep pid=7039 delay=10114111 [ns]
         swapper     0 [002] 80886.770073: sched:sched_stat_sleep: 
comm=ex_sleep pid=7039 delay=10114111 [ns]
         swapper     0 [002] 80886.770074: sched:sched_stat_sleep: 
comm=ex_sleep pid=7039 delay=10114111 [ns]
         swapper     0 [002] 80886.770075: sched:sched_stat_sleep: 
comm=ex_sleep pid=7039 delay=10114111 [ns]
         swapper     0 [002] 80886.770076: sched:sched_stat_sleep: 
comm=ex_sleep pid=7039 delay=10114111 [ns]
         swapper     0 [002] 80886.770077: sched:sched_stat_sleep: 
comm=ex_sleep pid=7039 delay=10114111 [ns]
         swapper     0 [002] 80886.770078: sched:sched_stat_sleep: 
comm=ex_sleep pid=7039 delay=10114111 [ns]

My machine here is an 8 core, and it seems as if the sleep times are recorded 
once for every core, thus explaining the factor 8 of difference. The 
individual delays in the output above are correct, i.e. 1s, 100ms, 10ms and 
some stuff at the beginning that is perf overhead.

Would be awesome if someone could fix this, or explain me how to use perf 
correctly to not trigger this issue. As a workaround I'll devide the period 
count by nproc for now.

Furthermore, it would be awesome if perf record would repeat the individual 
period in the results, i.e. my ideal output would look somewhat like this:

  100.00%            24    1.11s  ex_sleep  
fcca6947df440540ec364a93452b05b3a13ef7  [k] __schedule
            |
            ---__schedule
               schedule
               do_nanosleep
               hrtimer_nanosleep
               sys_nanosleep
               system_call
               __nanosleep
               0x7f36c861f2ad
               |          
               |--90.07% (1.00s) -- QThread::sleep
               |          main
               |          
               |-- 9.02% (100ms) -- QThread::msleep
               |          main
               |          
                -- 0.91% (10.0ms) -- QThread::usleep
                          main

Thanks
-- 
Milian Wolff
mail@milianw.de
http://milianw.de

                 reply	other threads:[~2015-08-04 15:36 UTC|newest]

Thread overview: [no followups] expand[flat|nested]  mbox.gz  Atom feed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2080352.v3MQbmJ2PT@milian-kdab2 \
    --to=mail@milianw.de \
    --cc=linux-perf-users@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).