All of lore.kernel.org
 help / color / mirror / Atom feed
From: Milian Wolff <milian.wolff@kdab.com>
To: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: linux-perf-users@vger.kernel.org, Jiri Olsa <jolsa@kernel.org>,
	David Ahern <dsahern@gmail.com>,
	Brendan Gregg <brendan.d.gregg@gmail.com>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Namhyung Kim <namhyung@kernel.org>,
	Wang Nan <wangnan0@huawei.com>
Subject: Re: [PATCH] perf trace: Add support for printing call chains on sys_exit events.
Date: Sat, 09 Apr 2016 13:44:51 +0200	[thread overview]
Message-ID: <1714692.A55s9sBUmI@agathebauer> (raw)
In-Reply-To: <20160408181853.GC25165@kernel.org>

[-- Attachment #1: Type: text/plain, Size: 4966 bytes --]

On Freitag, 8. April 2016 15:18:53 CEST Arnaldo Carvalho de Melo wrote:
> Em Fri, Apr 08, 2016 at 02:57:54PM -0300, Arnaldo Carvalho de Melo escreveu:
> > Em Fri, Apr 08, 2016 at 01:34:15PM +0200, Milian Wolff escreveu:
> > > Now, one can print the call chain for every encountered sys_exit
> > > event, e.g.:
> > > 
> > > Note that it is advised to increase the number of mmap pages to
> > > prevent event losses when using this new feature. Often, adding
> > > `-m 10M` to the `perf trace` invocation is enough.
> > > 
> > > This feature is also available in strace when built with libunwind
> > > 
> > > via `strace -k`. Performance wise, this solution is much better:
> > >     $ time find path/to/linux &> /dev/null
> > >     
> > >     real    0m0.051s
> > >     user    0m0.013s
> > >     sys     0m0.037s
> > >     
> > >     $ time perf trace -m 800M --call-graph dwarf find path/to/linux &>
> > >     /dev/null
> > >     
> > >     real    0m2.624s
> > >     user    0m1.203s
> > >     sys     0m1.333s
> > >     
> > >     $ time strace -k find path/to/linux  &> /dev/null
> > >     
> > >     real    0m35.398s
> > >     user    0m10.403s
> > >     sys     0m23.173s
> > > 
> > > Note that it is currently not possible to configure the print output.
> > > Adding such a feature, similar to what is available in `perf script`
> > > via its `--fields` knob can be added later on.
> > 
> > You mixed up multiple changes in one single patch, I'll break it down
> > while testing, and before pushing upstream.
> 
> Expanding a bit the audience:
> 
> First test, it works, great! But do we really need that address? I guess
> not, right, perhaps via some callchain parameter, to tell what we want to
> see? But by default knowing the function name + DSO seems enough, no?

...

> Yeah, you agree with that, now that I read the patch 8-):
> 
> +               /* TODO: user-configurable print_opts */
> +               unsigned int print_opts = PRINT_IP_OPT_IP

;-)

I even tried to make the code of `perf script` reusable for `perf trace`, but 
stopped once I realised that it currently relies on the existance of a 
`perf_session`, which does not exist when we do live tracing. It only exists 
for replaying in `builtin-trace.c`. So it involves some more refactoring which 
I did not have the time for.

<snip>

> Better, but perhaps we should try aligning, up to a limit, the function
> names/DSOs?
> 
> [root@jouet bpf]# trace -e nanosleep --call-graph dwarf usleep 1
>      0.063 ( 0.063 ms): usleep/6132 nanosleep(rqtp: 0x7ffd1b7a8e70          
>                              ) = 0 syscall_slow_exit_work
> ([kernel.kallsyms]) do_syscall_64          ([kernel.kallsyms])
> return_from_SYSCALL_64 ([kernel.kallsyms]) __nanosleep           
> (/usr/lib64/libc-2.22.so) usleep                 (/usr/lib64/libc-2.22.so)
> [unknown]              (/usr/bin/usleep) __libc_start_main     
> (/usr/lib64/libc-2.22.so) [unknown]              (/usr/bin/usleep)
> [root@jouet bpf]#
> 
> wdyt?

Yes, sounds good. Many profilers I've worked with always dump the IP, so I 
thought we should do it here as well. `perf script` e.g. does it. Could we 
maybe print the IP if the symbol is [unknown]?

> Also, after this initial support is in, I think the next step is to
> allow per syscall configs, like we have for per tracepoints, i.e. this
> should be possible:
> 
>    # trace -e nanosleep(call-graph=dwarf),socket -a
> 
> And then we would get callchains just for nanosleep calls, not for
> socket ones. We then need to think how to ask that efficiently to the
> kernel, in this case it should be instead of using
> raw_syscalls:sys_enter + tracepoint filters set via ioctl, to use
> syscalls:sys_{enter,exit}_nanosleep, with callgraphs +
> syscalls:sys_{enter,exit}_socket, without.
> 
> Doing it this way allows us to avoid asking callchains for a lot of
> events when we want just for a few ones, to reduce overhead.
 
Yep, sounds useful for some more specific use-cases. For me, this patch is 
sufficient as I'd just do:

    $ trace -e nanosleep --call-graph=dwarf ...

What I think is more important though is to make sure we only ask for 
callchains on the sys_exit events. Afaik, my patch will do it also for the 
sys_enter which is just additional cost with no benefit? So fixing that first 
is I think even more important, but I don't know how.

> Anyway, I think I'll just break this down into multiple patches and then
> we can work on these other aspects.

Yes, but note that I'll be busy and then on vacation for the next two weeks. 
I'll get back to this after wards.

> David, ah, his patch floated on the linux-perf-users mailing list, easy
> one once the thread->priv one got out of the way (it was being used by
> builtin-trace.c and the unwind code, ugh).
> 
> Thanks,

Same to you, cheers!

-- 
Milian Wolff | milian.wolff@kdab.com | Software Engineer
KDAB (Deutschland) GmbH&Co KG, a KDAB Group company
Tel: +49-30-521325470
KDAB - The Qt Experts

[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5903 bytes --]

  reply	other threads:[~2016-04-09 11:44 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-04-08 11:34 [PATCH] perf trace: Add support for printing call chains on sys_exit events Milian Wolff
2016-04-08 17:57 ` Arnaldo Carvalho de Melo
2016-04-08 18:18   ` Arnaldo Carvalho de Melo
2016-04-09 11:44     ` Milian Wolff [this message]
2016-04-09 11:38   ` Milian Wolff

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1714692.A55s9sBUmI@agathebauer \
    --to=milian.wolff@kdab.com \
    --cc=acme@kernel.org \
    --cc=brendan.d.gregg@gmail.com \
    --cc=dsahern@gmail.com \
    --cc=jolsa@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-perf-users@vger.kernel.org \
    --cc=namhyung@kernel.org \
    --cc=wangnan0@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.