From: Milian Wolff <mail@milianw.de>
To: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: David Ahern <dsahern@gmail.com>,
linux-perf-users <linux-perf-users@vger.kernel.org>,
Namhyung Kim <namhyung@gmail.com>, Ingo Molnar <mingo@kernel.org>,
Joseph Schuchart <joseph.schuchart@tu-dresden.de>
Subject: Re: Perf event for Wall-time based sampling?
Date: Fri, 19 Sep 2014 10:11:21 +0200 [thread overview]
Message-ID: <5770573.p6hpIeRPQV@milian-kdab2> (raw)
In-Reply-To: <20140918203625.GM2770@kernel.org>
On Thursday 18 September 2014 17:36:25 Arnaldo Carvalho de Melo wrote:
> Em Thu, Sep 18, 2014 at 02:17:49PM -0600, David Ahern escreveu:
> > On 9/18/14, 1:17 PM, Arnaldo Carvalho de Melo wrote:
> > >>This was also why I asked my initial question, which I want to repeat
> > >>once
> > >>
> > >>>more: Is there a technical reason to not offer a "timer" software event
> > >>>to
> > >>>perf? I'm a complete layman when it comes to Kernel internals, but from
> > >>>a user point of view this would be awesome:
> > >>>
> > >>>perf record --call-graph dwarf -e sw-timer -F 100 someapplication
> > >>>
> > >>>This command would then create a timer in the kernel with a 100Hz
> > >>>frequency. Whenever it fires, the callgraphs of all threads in
> > >>>$someapplication are sampled and written to perf.data. Is this
> > >>>technically not feasible? Or is it simply not implemented?
> > >>>I'm experimenting with a libunwind based profiler, and with some ugly
> > >>>signal hackery I can now grab backtraces by sending my application
> > >>>SIGUSR1. Based on> >
> > >Humm, can't you do the same thing with perf? I.e. you send SIGUSR1 to
> > >your app with the frequency you want, and then hook a 'perf probe' into
> > >your signal... /me tries some stuff, will get back with results...
That is actually a very good idea. With the more powerful scripting abilities
in perf now, that could/should do the job indeed. I'll also try this out.
> > Current profiling options with perf require the process to be running.
> > What
>
> Ok, so you want to see what is the wait channel and unwind the stack
> from there? Is that the case? I.e. again, a sysrq-t equivalent?
Hey again :)
If I'm not mistaken, I could not yet bring my point across. The final goal is
to profile both, wait time _and_ CPU time, combined! By sampling a userspace
program with a constant frequency and some statistics one can get extremely
useful information out of the data. I want to use perf to automate the process
that Mike Dunlavey explains here: http://stackoverflow.com/a/378024/35250
So yes, I want to see the wait channel and unwind from there, if the process
is waiting. Otherwise just do the normal unwinding you'd do when any other
perf event occurs.
> > Milian want is to grab samples every timer expiration even if process is
> > not running.
>
> What for? And by "grab samples" you want to know where it is waiting for
> something, together with its callchain?
See above. If I can sample the callchain every N ms, preferrably in a per-
thread manner, I can create tools which find the slowest userspace functions.
This is based on "inclusive" cost, which is made up out of CPU time _and_ wait
time combined. With it one will find all of the following:
- CPU hotspots
- IO wait time
- lock contention in a multi-threaded application
> > Any limitations that would prevent doing this with a sw event? e.g, mimic
> > task-clock just don't disable the timer when the task is scheduled out.
>
> I'm just trying to figure out if what people want is a complete
> backtrace of all threads in a process, no matter what they are doing, at
> some given time, i.e. at "sysrq-t" time, is that the case?
Yes, that sounds correct to me. I tried it out on my system, but the output in
dmesg is far too large and I could not find the call stack information of my
test application while it slept. Looking at the other backtraces I see there,
I'm not sure whether sysrq-t only outputs a kernel-backtrace? Or maybe it's
because I'm missing debug symbols for these applications?
[167576.049113] mysqld S 0000000000000000 0 17023 1
0x00000000
[167576.049115] ffff8801222a7ad0 0000000000000082 ffff8800aa8c9460
0000000000014580
[167576.049117] ffff8801222a7fd8 0000000000014580 ffff8800aa8c9460
ffff8801222a7a18
[167576.049119] ffffffff812a2640 ffff8800bb975c40 ffff880237cd4ef0
ffff8801222a7b20
[167576.049121] Call Trace:
[167576.049124] [<ffffffff812a2640>] ? cpumask_next_and+0x30/0x50
[167576.049126] [<ffffffff810b65b4>] ? add_wait_queue+0x44/0x50
[167576.049128] [<ffffffff8152ceb9>] schedule+0x29/0x70
[167576.049130] [<ffffffff8152c4ad>] schedule_hrtimeout_range_clock+0x3d/0x60
[167576.049132] [<ffffffff8152c4e3>] schedule_hrtimeout_range+0x13/0x20
[167576.049134] [<ffffffff811d5589>] poll_schedule_timeout+0x49/0x70
[167576.049136] [<ffffffff811d6d5e>] do_sys_poll+0x4be/0x570
[167576.049138] [<ffffffff81155880>] ? __alloc_pages_nodemask+0x180/0xc20
[167576.049140] [<ffffffff8108e3ae>] ? alloc_pid+0x2e/0x4c0
[167576.049142] [<ffffffff811d5700>] ? poll_select_copy_remaining+0x150/0x150
[167576.049145] [<ffffffff810bd1f0>] ? cpuacct_charge+0x50/0x60
[167576.049147] [<ffffffff811a5de3>] ? kmem_cache_alloc+0x143/0x170
[167576.049149] [<ffffffff8108e3ae>] ? alloc_pid+0x2e/0x4c0
[167576.049151] [<ffffffff810a7d08>] ? __enqueue_entity+0x78/0x80
[167576.049153] [<ffffffff810addee>] ? enqueue_entity+0x24e/0xaa0
[167576.049155] [<ffffffff812a2640>] ? cpumask_next_and+0x30/0x50
[167576.049158] [<ffffffff810ae74d>] ? enqueue_task_fair+0x10d/0x5b0
[167576.049160] [<ffffffff81045c3d>] ? native_smp_send_reschedule+0x4d/0x70
[167576.049162] [<ffffffff8109e740>] ? resched_task+0xc0/0xd0
[167576.049165] [<ffffffff8109fff7>] ? check_preempt_curr+0x57/0x90
[167576.049167] [<ffffffff810a2e2b>] ? wake_up_new_task+0x11b/0x1c0
[167576.049169] [<ffffffff8106d5c6>] ? do_fork+0x136/0x3f0
[167576.049171] [<ffffffff811df235>] ? __fget_light+0x25/0x70
[167576.049173] [<ffffffff811d6f37>] SyS_poll+0x97/0x120
[167576.049176] [<ffffffff81530d69>] ? system_call_fastpath+0x16/0x1b
[167576.049178] [<ffffffff81530d69>] system_call_fastpath+0x16/0x1b
I'm interested in the the other side of the fence. The backtrace could/should
stop at system_call_fastpath.
Bye
--
Milian Wolff
mail@milianw.de
http://milianw.de
next prev parent reply other threads:[~2014-09-19 8:11 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-09-18 12:32 Perf event for Wall-time based sampling? Milian Wolff
2014-09-18 13:23 ` Arnaldo Carvalho de Melo
2014-09-18 13:41 ` Milian Wolff
2014-09-18 14:51 ` Arnaldo Carvalho de Melo
2014-09-18 15:26 ` Milian Wolff
2014-09-18 15:57 ` Arnaldo Carvalho de Melo
2014-09-18 16:37 ` Milian Wolff
2014-09-18 19:17 ` Arnaldo Carvalho de Melo
2014-09-18 19:31 ` Arnaldo Carvalho de Melo
2014-09-18 20:17 ` David Ahern
2014-09-18 20:36 ` Arnaldo Carvalho de Melo
2014-09-18 20:39 ` David Ahern
2014-09-19 8:11 ` Milian Wolff [this message]
2014-09-19 9:08 ` Milian Wolff
2014-09-19 14:47 ` Arnaldo Carvalho de Melo
2014-09-19 15:04 ` David Ahern
2014-09-19 15:05 ` Milian Wolff
2014-09-19 14:17 ` David Ahern
2014-09-19 14:39 ` Milian Wolff
2014-09-19 14:55 ` David Ahern
2014-09-19 5:59 ` Namhyung Kim
2014-09-19 14:33 ` Arnaldo Carvalho de Melo
2014-09-19 14:53 ` Milian Wolff
2014-09-19 15:50 ` Namhyung Kim
2014-09-22 7:56 ` Namhyung Kim
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5770573.p6hpIeRPQV@milian-kdab2 \
--to=mail@milianw.de \
--cc=acme@kernel.org \
--cc=dsahern@gmail.com \
--cc=joseph.schuchart@tu-dresden.de \
--cc=linux-perf-users@vger.kernel.org \
--cc=mingo@kernel.org \
--cc=namhyung@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).