From: Milian Wolff <mail@milianw.de>
To: linux-perf-users@vger.kernel.org
Subject: perf sched trick to profile wait times: period miscalculated on multicore machines
Date: Tue, 04 Aug 2015 17:36:54 +0200 [thread overview]
Message-ID: <2080352.v3MQbmJ2PT@milian-kdab2> (raw)
Hello all,
following the perf tutorial, I recorded and merged samples and backtraces to
find out where my code is sleeping. The result is something like this:
$ perf report -g graph --no-children --show-total-period --stdio
...
100.00% 24 8886224240 ex_sleep
fcca6947df440540ec364a93452b05b3a13ef7 [k] __schedule
|
---__schedule
schedule
do_nanosleep
hrtimer_nanosleep
sys_nanosleep
system_call
__nanosleep
0x7f36c861f2ad
|
|--90.07%-- QThread::sleep
| main
|
|--9.02%-- QThread::msleep
| main
|
--0.91%-- QThread::usleep
main
Cool! But the period should be in ns, no? That number reported is wrong by an
order of magnitude, roughly:
The example code sleeps for 1s, 100ms and 10ms. perf stat agrees:
1.112868066 seconds time elapsed ( +- 0.01%, avg over 10 runs )
But perf record shows a period of 8886224240ns = 8.89s - a difference of
factor 8 (which corresponds to the number of cores on this machine, see also
below).
I recorded the data by doing
$ sudo mount -o remount,mode=755 /sys/kernel/debug
$ sudo mount -o remount,mode=755 /sys/kernel/debug/tracing
$ echo \"-1\" | sudo tee /proc/sys/kernel/perf_event_paranoid
perf record -e sched:sched_stat_sleep -e sched:sched_switch -e
sched:sched_process_exit --call-graph dwarf -o perf.data.raw ./ex_sleep
perf inject -v -s -i perf.data.raw -o perf.data
perf report --no-children --show-total-period --stdio
One interesting find of mine is the following:
perf script -i perf.data.raw | grep delay
:7038 7038 [004] 80885.657156: sched:sched_stat_sleep: comm=perf
pid=7039 delay=1015990 [ns]
:7038 7038 [004] 80885.657158: sched:sched_stat_sleep: comm=perf
pid=7039 delay=1015990 [ns]
:7038 7038 [004] 80885.657158: sched:sched_stat_sleep: comm=perf
pid=7039 delay=1015990 [ns]
:7038 7038 [004] 80885.657158: sched:sched_stat_sleep: comm=perf
pid=7039 delay=1015990 [ns]
:7038 7038 [004] 80885.657158: sched:sched_stat_sleep: comm=perf
pid=7039 delay=1015990 [ns]
:7038 7038 [004] 80885.657158: sched:sched_stat_sleep: comm=perf
pid=7039 delay=1015990 [ns]
:7038 7038 [004] 80885.657159: sched:sched_stat_sleep: comm=perf
pid=7039 delay=1015990 [ns]
:7038 7038 [004] 80885.657159: sched:sched_stat_sleep: comm=perf
pid=7039 delay=1015990 [ns]
swapper 0 [002] 80886.659750: sched:sched_stat_sleep:
comm=ex_sleep pid=7039 delay=1000521253 [ns]
swapper 0 [002] 80886.659752: sched:sched_stat_sleep:
comm=ex_sleep pid=7039 delay=1000521253 [ns]
swapper 0 [002] 80886.659753: sched:sched_stat_sleep:
comm=ex_sleep pid=7039 delay=1000521253 [ns]
swapper 0 [002] 80886.659754: sched:sched_stat_sleep:
comm=ex_sleep pid=7039 delay=1000521253 [ns]
swapper 0 [002] 80886.659754: sched:sched_stat_sleep:
comm=ex_sleep pid=7039 delay=1000521253 [ns]
swapper 0 [002] 80886.659755: sched:sched_stat_sleep:
comm=ex_sleep pid=7039 delay=1000521253 [ns]
swapper 0 [002] 80886.659756: sched:sched_stat_sleep:
comm=ex_sleep pid=7039 delay=1000521253 [ns]
swapper 0 [002] 80886.659757: sched:sched_stat_sleep:
comm=ex_sleep pid=7039 delay=1000521253 [ns]
swapper 0 [002] 80886.759925: sched:sched_stat_sleep:
comm=ex_sleep pid=7039 delay=100144138 [ns]
swapper 0 [002] 80886.759927: sched:sched_stat_sleep:
comm=ex_sleep pid=7039 delay=100144138 [ns]
swapper 0 [002] 80886.759927: sched:sched_stat_sleep:
comm=ex_sleep pid=7039 delay=100144138 [ns]
swapper 0 [002] 80886.759928: sched:sched_stat_sleep:
comm=ex_sleep pid=7039 delay=100144138 [ns]
swapper 0 [002] 80886.759929: sched:sched_stat_sleep:
comm=ex_sleep pid=7039 delay=100144138 [ns]
swapper 0 [002] 80886.759930: sched:sched_stat_sleep:
comm=ex_sleep pid=7039 delay=100144138 [ns]
swapper 0 [002] 80886.759930: sched:sched_stat_sleep:
comm=ex_sleep pid=7039 delay=100144138 [ns]
swapper 0 [002] 80886.759931: sched:sched_stat_sleep:
comm=ex_sleep pid=7039 delay=100144138 [ns]
swapper 0 [002] 80886.770070: sched:sched_stat_sleep:
comm=ex_sleep pid=7039 delay=10114111 [ns]
swapper 0 [002] 80886.770072: sched:sched_stat_sleep:
comm=ex_sleep pid=7039 delay=10114111 [ns]
swapper 0 [002] 80886.770073: sched:sched_stat_sleep:
comm=ex_sleep pid=7039 delay=10114111 [ns]
swapper 0 [002] 80886.770074: sched:sched_stat_sleep:
comm=ex_sleep pid=7039 delay=10114111 [ns]
swapper 0 [002] 80886.770075: sched:sched_stat_sleep:
comm=ex_sleep pid=7039 delay=10114111 [ns]
swapper 0 [002] 80886.770076: sched:sched_stat_sleep:
comm=ex_sleep pid=7039 delay=10114111 [ns]
swapper 0 [002] 80886.770077: sched:sched_stat_sleep:
comm=ex_sleep pid=7039 delay=10114111 [ns]
swapper 0 [002] 80886.770078: sched:sched_stat_sleep:
comm=ex_sleep pid=7039 delay=10114111 [ns]
My machine here is an 8 core, and it seems as if the sleep times are recorded
once for every core, thus explaining the factor 8 of difference. The
individual delays in the output above are correct, i.e. 1s, 100ms, 10ms and
some stuff at the beginning that is perf overhead.
Would be awesome if someone could fix this, or explain me how to use perf
correctly to not trigger this issue. As a workaround I'll devide the period
count by nproc for now.
Furthermore, it would be awesome if perf record would repeat the individual
period in the results, i.e. my ideal output would look somewhat like this:
100.00% 24 1.11s ex_sleep
fcca6947df440540ec364a93452b05b3a13ef7 [k] __schedule
|
---__schedule
schedule
do_nanosleep
hrtimer_nanosleep
sys_nanosleep
system_call
__nanosleep
0x7f36c861f2ad
|
|--90.07% (1.00s) -- QThread::sleep
| main
|
|-- 9.02% (100ms) -- QThread::msleep
| main
|
-- 0.91% (10.0ms) -- QThread::usleep
main
Thanks
--
Milian Wolff
mail@milianw.de
http://milianw.de
reply other threads:[~2015-08-04 15:36 UTC|newest]
Thread overview: [no followups] expand[flat|nested] mbox.gz Atom feed
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=2080352.v3MQbmJ2PT@milian-kdab2 \
--to=mail@milianw.de \
--cc=linux-perf-users@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).