From: Andrew Vagin <avagin@openvz.org>
To: unlisted-recipients:; (no To-header on input)
Cc: Arun Sharma <asharma@fb.com>, Oleg Strikov <OSTRIKOV@nvidia.com>,
Steven Rostedt <rostedt@goodmis.org>,
Frederic Weisbecker <fweisbec@gmail.com>,
Ingo Molnar <mingo@redhat.com>,
Peter Zijlstra <a.p.zijlstra@chello.nl>,
linux-kernel@vger.kernel.org
Subject: [RFC] [PATCH 0/5] Teach perf tool to profile sleep times (V4)
Date: Mon, 4 Jun 2012 12:09:37 +0400 [thread overview]
Message-ID: <1338797382-287275-1-git-send-email-avagin@openvz.org> (raw)
Do you want to know where your code waits locks for a long time?
Yes! It's for you. This feature helps you to find bottlenecks.
It's not artificial task. Once one of my colleague was investigating a
scalability problem. He pressed sysrq-t some times and tried to merge
call-chains by hand. But perf can do that better.
Here is fourth attempt to add this functionality in the kernel.
Previous attempts:
* [PATCH 0/4] trace: add ability to collect call chains of non-current task.
* [PATCH rh6 0/7] Teach perf tool to collect D-state statistics
This version doesn't work for non-root users.
* "sched/tracing: Add a new tracepoint for sleeptime" by Arun Sharma
It has been committed, but then reverted due to performance reason.
This series resolves a problem with events filtering and non-root users.
Before I tried to use sched_stat* events, but it occurs in a context of
another task. For this case logic of perf_event_context* doesn't work.
This series adds one more event sched:sched_switch_finish, it's sent
when a task starts executing on a cpu. When a task goes out from cpu,
sched:sched_switch is sent.
The time difference between sched_switch and sched_switch_finish is a time
when a task wasn't being executed on a cpu.
For profiling sleep time, events of both types should be collected and
filtered by a defined state (interruptible of uninterruptible sleep).
Then events should be post-processed with help "perf inject". It saves the time
of sched_switch event for each process and when it meets stat_switch_finish,
it sets time difference in sample->period and saves this new sample
in the new perf.data.
./perf record -e sched:sched_switch -g --filter 'prev_state == 1' \
-e sched:sched_switch_finish --filter 'state == 1' -P ~/foo
./perf inject -i ~/perf.data -o perf.data.new -s
A test program:
#include<unistd.h>
#include<time.h>
#include<sys/select.h>
int main()
{
struct timespec ts1;
struct timeval tv1;
int i;
long s;
for (i = 0; i< 2; i++) {
ts1.tv_sec = 0;
ts1.tv_nsec = 10000000;
nanosleep(&ts1, NULL);
tv1.tv_sec = 0;
tv1.tv_usec = 40000;
select(0, NULL, NULL, NULL,&tv1);
if (i % 100 == 0) {
s = time(NULL);
while (time(NULL) - s < 4);
}
}
return 1;
}
Profiling results:
# Overhead Command Shared Object Symbol
# ........ ....... ................. ......................
#
100.00% foo [kernel.kallsyms] [k] finish_task_switch
|
--- finish_task_switch
__schedule
schedule
|
|--79.85%-- schedule_hrtimeout_range_clock
| schedule_hrtimeout_range
| poll_schedule_timeout
| do_select
| core_sys_select
| sys_select
| system_call
| __select
| __libc_start_main
|
--20.15%-- do_nanosleep
hrtimer_nanosleep
sys_nanosleep
system_call
__GI___libc_nanosleep
__libc_start_main
Andrew Vagin (5):
event: add an event sched_switch_finish
sched: send the event sched_switch before perf_event_task_sched_out
sched: save a previous state on task_struct
perf: teach "perf inject" to work with files
perf: teach perf inject to merge sched_switch* events
include/linux/sched.h | 3 +
include/trace/events/sched.h | 29 ++++++++++
kernel/sched/core.c | 3 +-
tools/perf/builtin-inject.c | 119 +++++++++++++++++++++++++++++++++++++++---
4 files changed, 146 insertions(+), 8 deletions(-)
next reply other threads:[~2012-06-04 8:13 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-06-04 8:09 Andrew Vagin [this message]
2012-06-04 8:09 ` [PATCH 1/5] sched: event: add trace events when a task starts executing on a cpu Andrew Vagin
2012-06-04 8:09 ` [PATCH 2/5] sched: send the event sched_switch before perf_event_task_sched_out Andrew Vagin
2012-06-04 8:09 ` [PATCH 3/5] sched: save a previous state on task_struct Andrew Vagin
2012-06-04 8:09 ` [PATCH 4/5] perf: teach "perf inject" to work with files Andrew Vagin
2012-06-04 8:09 ` [PATCH 5/5] perf: teach perf inject to merge sched_switch* events for profiling sleep times Andrew Vagin
2012-06-04 12:40 ` [RFC] [PATCH 0/5] Teach perf tool to profile sleep times (V4) Peter Zijlstra
2012-06-04 12:50 ` Peter Zijlstra
2012-06-04 13:46 ` Steven Rostedt
2012-06-04 17:01 ` Arun Sharma
2012-06-04 20:52 ` Peter Zijlstra
2012-06-06 8:32 ` Ingo Molnar
2012-06-12 13:15 ` Andrew Wagin
2012-06-12 17:46 ` Peter Zijlstra
2012-06-04 17:03 ` Arun Sharma
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1338797382-287275-1-git-send-email-avagin@openvz.org \
--to=avagin@openvz.org \
--cc=OSTRIKOV@nvidia.com \
--cc=a.p.zijlstra@chello.nl \
--cc=asharma@fb.com \
--cc=fweisbec@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=rostedt@goodmis.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).