* perf sched trick to profile wait times: period miscalculated on multicore machines
@ 2015-08-04 15:36 Milian Wolff
0 siblings, 0 replies; only message in thread
From: Milian Wolff @ 2015-08-04 15:36 UTC (permalink / raw)
To: linux-perf-users
Hello all,
following the perf tutorial, I recorded and merged samples and backtraces to
find out where my code is sleeping. The result is something like this:
$ perf report -g graph --no-children --show-total-period --stdio
...
100.00% 24 8886224240 ex_sleep
fcca6947df440540ec364a93452b05b3a13ef7 [k] __schedule
|
---__schedule
schedule
do_nanosleep
hrtimer_nanosleep
sys_nanosleep
system_call
__nanosleep
0x7f36c861f2ad
|
|--90.07%-- QThread::sleep
| main
|
|--9.02%-- QThread::msleep
| main
|
--0.91%-- QThread::usleep
main
Cool! But the period should be in ns, no? That number reported is wrong by an
order of magnitude, roughly:
The example code sleeps for 1s, 100ms and 10ms. perf stat agrees:
1.112868066 seconds time elapsed ( +- 0.01%, avg over 10 runs )
But perf record shows a period of 8886224240ns = 8.89s - a difference of
factor 8 (which corresponds to the number of cores on this machine, see also
below).
I recorded the data by doing
$ sudo mount -o remount,mode=755 /sys/kernel/debug
$ sudo mount -o remount,mode=755 /sys/kernel/debug/tracing
$ echo \"-1\" | sudo tee /proc/sys/kernel/perf_event_paranoid
perf record -e sched:sched_stat_sleep -e sched:sched_switch -e
sched:sched_process_exit --call-graph dwarf -o perf.data.raw ./ex_sleep
perf inject -v -s -i perf.data.raw -o perf.data
perf report --no-children --show-total-period --stdio
One interesting find of mine is the following:
perf script -i perf.data.raw | grep delay
:7038 7038 [004] 80885.657156: sched:sched_stat_sleep: comm=perf
pid=7039 delay=1015990 [ns]
:7038 7038 [004] 80885.657158: sched:sched_stat_sleep: comm=perf
pid=7039 delay=1015990 [ns]
:7038 7038 [004] 80885.657158: sched:sched_stat_sleep: comm=perf
pid=7039 delay=1015990 [ns]
:7038 7038 [004] 80885.657158: sched:sched_stat_sleep: comm=perf
pid=7039 delay=1015990 [ns]
:7038 7038 [004] 80885.657158: sched:sched_stat_sleep: comm=perf
pid=7039 delay=1015990 [ns]
:7038 7038 [004] 80885.657158: sched:sched_stat_sleep: comm=perf
pid=7039 delay=1015990 [ns]
:7038 7038 [004] 80885.657159: sched:sched_stat_sleep: comm=perf
pid=7039 delay=1015990 [ns]
:7038 7038 [004] 80885.657159: sched:sched_stat_sleep: comm=perf
pid=7039 delay=1015990 [ns]
swapper 0 [002] 80886.659750: sched:sched_stat_sleep:
comm=ex_sleep pid=7039 delay=1000521253 [ns]
swapper 0 [002] 80886.659752: sched:sched_stat_sleep:
comm=ex_sleep pid=7039 delay=1000521253 [ns]
swapper 0 [002] 80886.659753: sched:sched_stat_sleep:
comm=ex_sleep pid=7039 delay=1000521253 [ns]
swapper 0 [002] 80886.659754: sched:sched_stat_sleep:
comm=ex_sleep pid=7039 delay=1000521253 [ns]
swapper 0 [002] 80886.659754: sched:sched_stat_sleep:
comm=ex_sleep pid=7039 delay=1000521253 [ns]
swapper 0 [002] 80886.659755: sched:sched_stat_sleep:
comm=ex_sleep pid=7039 delay=1000521253 [ns]
swapper 0 [002] 80886.659756: sched:sched_stat_sleep:
comm=ex_sleep pid=7039 delay=1000521253 [ns]
swapper 0 [002] 80886.659757: sched:sched_stat_sleep:
comm=ex_sleep pid=7039 delay=1000521253 [ns]
swapper 0 [002] 80886.759925: sched:sched_stat_sleep:
comm=ex_sleep pid=7039 delay=100144138 [ns]
swapper 0 [002] 80886.759927: sched:sched_stat_sleep:
comm=ex_sleep pid=7039 delay=100144138 [ns]
swapper 0 [002] 80886.759927: sched:sched_stat_sleep:
comm=ex_sleep pid=7039 delay=100144138 [ns]
swapper 0 [002] 80886.759928: sched:sched_stat_sleep:
comm=ex_sleep pid=7039 delay=100144138 [ns]
swapper 0 [002] 80886.759929: sched:sched_stat_sleep:
comm=ex_sleep pid=7039 delay=100144138 [ns]
swapper 0 [002] 80886.759930: sched:sched_stat_sleep:
comm=ex_sleep pid=7039 delay=100144138 [ns]
swapper 0 [002] 80886.759930: sched:sched_stat_sleep:
comm=ex_sleep pid=7039 delay=100144138 [ns]
swapper 0 [002] 80886.759931: sched:sched_stat_sleep:
comm=ex_sleep pid=7039 delay=100144138 [ns]
swapper 0 [002] 80886.770070: sched:sched_stat_sleep:
comm=ex_sleep pid=7039 delay=10114111 [ns]
swapper 0 [002] 80886.770072: sched:sched_stat_sleep:
comm=ex_sleep pid=7039 delay=10114111 [ns]
swapper 0 [002] 80886.770073: sched:sched_stat_sleep:
comm=ex_sleep pid=7039 delay=10114111 [ns]
swapper 0 [002] 80886.770074: sched:sched_stat_sleep:
comm=ex_sleep pid=7039 delay=10114111 [ns]
swapper 0 [002] 80886.770075: sched:sched_stat_sleep:
comm=ex_sleep pid=7039 delay=10114111 [ns]
swapper 0 [002] 80886.770076: sched:sched_stat_sleep:
comm=ex_sleep pid=7039 delay=10114111 [ns]
swapper 0 [002] 80886.770077: sched:sched_stat_sleep:
comm=ex_sleep pid=7039 delay=10114111 [ns]
swapper 0 [002] 80886.770078: sched:sched_stat_sleep:
comm=ex_sleep pid=7039 delay=10114111 [ns]
My machine here is an 8 core, and it seems as if the sleep times are recorded
once for every core, thus explaining the factor 8 of difference. The
individual delays in the output above are correct, i.e. 1s, 100ms, 10ms and
some stuff at the beginning that is perf overhead.
Would be awesome if someone could fix this, or explain me how to use perf
correctly to not trigger this issue. As a workaround I'll devide the period
count by nproc for now.
Furthermore, it would be awesome if perf record would repeat the individual
period in the results, i.e. my ideal output would look somewhat like this:
100.00% 24 1.11s ex_sleep
fcca6947df440540ec364a93452b05b3a13ef7 [k] __schedule
|
---__schedule
schedule
do_nanosleep
hrtimer_nanosleep
sys_nanosleep
system_call
__nanosleep
0x7f36c861f2ad
|
|--90.07% (1.00s) -- QThread::sleep
| main
|
|-- 9.02% (100ms) -- QThread::msleep
| main
|
-- 0.91% (10.0ms) -- QThread::usleep
main
Thanks
--
Milian Wolff
mail@milianw.de
http://milianw.de
^ permalink raw reply [flat|nested] only message in thread
only message in thread, other threads:[~2015-08-04 15:36 UTC | newest]
Thread overview: (only message) (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-08-04 15:36 perf sched trick to profile wait times: period miscalculated on multicore machines Milian Wolff
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).