public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* LBR callchains from tracepoints
       [not found]     ` <20160426010358.GD16708@kernel.org>
@ 2016-04-26  1:24       ` Arnaldo Carvalho de Melo
  2016-04-26 16:38         ` Peter Zijlstra
  0 siblings, 1 reply; 5+ messages in thread
From: Arnaldo Carvalho de Melo @ 2016-04-26  1:24 UTC (permalink / raw)
  To: Stephane Eranian
  Cc: Peter Zijlstra, David Ahern, Milian Wolff,
	Frédéric Weisbecker, Ingo Molnar, Namhyung Kim,
	Linux Kernel Mailing List

Em Mon, Apr 25, 2016 at 10:03:58PM -0300, Arnaldo Carvalho de Melo escreveu:
> I now need to continue investigation why this doesn't seem to work from
> tracepoints...

Bummer, the changeset (at the end of this message) hasn't any
explanation, is this really impossible? I.e. LBR callstacks from
tracepoints? Even if we set perf_event_attr.exclude_callchain_kernel?

I've read somewhere that LBR wouldn't work for the kernel, but when I
tried, for cycles:ppp I got:

[acme@jouet linux]$ perf record --call-graph lbr usleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.018 MB perf.data (9 samples) ]
[acme@jouet linux]$ perf evlist -v
cycles:ppp: size: 112, { sample_period, sample_freq }: 4000,
sample_type: IP|TID|TIME|CALLCHAIN|PERIOD|BRANCH_STACK, disabled: 1,
inherit: 1, mmap: 1, comm: 1, freq: 1, enable_on_exec: 1, task: 1,
precise_ip: 3, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec:
1, branch_sample_type: USER|CALL_STACK|NO_FLAGS|NO_CYCLES
[acme@jouet linux]$

-   59.69%  usleep   [kernel]  [k] vma_interval_tree_insert
    →vma_interval_tree_insert  [kernel]
     vma_adjust                [kernel]
     __split_vma.isra.31       [kernel]
     split_vma                 [kernel]
     mprotect_fixup            [kernel]
     sys_mprotect              [kernel]
     entry_SYSCALL_64_fastpath [kernel]
     mprotect                  ld-2.22.so
     _dl_relocate_object       ld-2.22.so
     memcpy@GLIBC_2.2.5        libc-2.22.so
     _dl_relocate_object       ld-2.22.so
     __gettimeofday            libc-2.22.so
     _dl_vdso_vsym             libc-2.22.so
     _dl_lookup_symbol_x       ld-2.22.so

This was done on a Broadwell system (ThinkPad t450s).

- Arnaldo

commit 2481c5fa6db0237e4f0168f88913178b2b495b7c
Author: Stephane Eranian <eranian@google.com>
Date:   Thu Feb 9 23:20:59 2012 +0100

    perf: Disable PERF_SAMPLE_BRANCH_* when not supported
    
    PERF_SAMPLE_BRANCH_* is disabled for:
    
     - SW events (sw counters, tracepoints)
     - HW breakpoints
     - ALL but Intel x86 architecture
     - AMD64 processors
    
    Signed-off-by: Stephane Eranian <eranian@google.com>
    Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
    Link: http://lkml.kernel.org/r/1328826068-11713-10-git-send-email-eranian@google.com
    Signed-off-by: Ingo Molnar <mingo@elte.hu>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: LBR callchains from tracepoints
  2016-04-26  1:24       ` LBR callchains from tracepoints Arnaldo Carvalho de Melo
@ 2016-04-26 16:38         ` Peter Zijlstra
  2016-04-26 17:26           ` Alexei Starovoitov
  0 siblings, 1 reply; 5+ messages in thread
From: Peter Zijlstra @ 2016-04-26 16:38 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Stephane Eranian, David Ahern, Milian Wolff,
	Frédéric Weisbecker, Ingo Molnar, Namhyung Kim,
	Linux Kernel Mailing List

On Mon, Apr 25, 2016 at 10:24:31PM -0300, Arnaldo Carvalho de Melo wrote:
> Em Mon, Apr 25, 2016 at 10:03:58PM -0300, Arnaldo Carvalho de Melo escreveu:
> > I now need to continue investigation why this doesn't seem to work from
> > tracepoints...
> 
> Bummer, the changeset (at the end of this message) hasn't any
> explanation, is this really impossible? I.e. LBR callstacks from
> tracepoints? Even if we set perf_event_attr.exclude_callchain_kernel?

Could maybe be done, but its tricky to implement as the LBR is managed
by the hardware PMU and tracepoints are a software PMU, so we need to
then somehow frob with cross-pmu resources, in a very arch specific way.
And programmability of the hardware PMU will then depend on events
outside of it.

All rather icky.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: LBR callchains from tracepoints
  2016-04-26 16:38         ` Peter Zijlstra
@ 2016-04-26 17:26           ` Alexei Starovoitov
  2016-04-26 18:20             ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 5+ messages in thread
From: Alexei Starovoitov @ 2016-04-26 17:26 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Arnaldo Carvalho de Melo, Stephane Eranian, David Ahern,
	Milian Wolff, Frédéric Weisbecker, Ingo Molnar,
	Namhyung Kim, Linux Kernel Mailing List

On Tue, Apr 26, 2016 at 06:38:28PM +0200, Peter Zijlstra wrote:
> On Mon, Apr 25, 2016 at 10:24:31PM -0300, Arnaldo Carvalho de Melo wrote:
> > Em Mon, Apr 25, 2016 at 10:03:58PM -0300, Arnaldo Carvalho de Melo escreveu:
> > > I now need to continue investigation why this doesn't seem to work from
> > > tracepoints...
> > 
> > Bummer, the changeset (at the end of this message) hasn't any
> > explanation, is this really impossible? I.e. LBR callstacks from
> > tracepoints? Even if we set perf_event_attr.exclude_callchain_kernel?
> 
> Could maybe be done, but its tricky to implement as the LBR is managed
> by the hardware PMU and tracepoints are a software PMU, so we need to
> then somehow frob with cross-pmu resources, in a very arch specific way.
> And programmability of the hardware PMU will then depend on events
> outside of it.

btw we're thinking to add support for lbr to bpf, so that from the program
we can get accurate and fast stacks. That's especially important for user
space stacks. No clear idea how to do it yet, but it would be really useful.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: LBR callchains from tracepoints
  2016-04-26 17:26           ` Alexei Starovoitov
@ 2016-04-26 18:20             ` Arnaldo Carvalho de Melo
  2016-04-26 19:07               ` Peter Zijlstra
  0 siblings, 1 reply; 5+ messages in thread
From: Arnaldo Carvalho de Melo @ 2016-04-26 18:20 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Peter Zijlstra, Stephane Eranian, David Ahern, Milian Wolff,
	Frédéric Weisbecker, Ingo Molnar, Namhyung Kim,
	Linux Kernel Mailing List

Em Tue, Apr 26, 2016 at 10:26:32AM -0700, Alexei Starovoitov escreveu:
> On Tue, Apr 26, 2016 at 06:38:28PM +0200, Peter Zijlstra wrote:
> > On Mon, Apr 25, 2016 at 10:24:31PM -0300, Arnaldo Carvalho de Melo wrote:
> > > Em Mon, Apr 25, 2016 at 10:03:58PM -0300, Arnaldo Carvalho de Melo escreveu:
> > > > I now need to continue investigation why this doesn't seem to work from
> > > > tracepoints...

> > > Bummer, the changeset (at the end of this message) hasn't any
> > > explanation, is this really impossible? I.e. LBR callstacks from
> > > tracepoints? Even if we set perf_event_attr.exclude_callchain_kernel?

> > Could maybe be done, but its tricky to implement as the LBR is managed
> > by the hardware PMU and tracepoints are a software PMU, so we need to
> > then somehow frob with cross-pmu resources, in a very arch specific way.
> > And programmability of the hardware PMU will then depend on events
> > outside of it.
 
> btw we're thinking to add support for lbr to bpf, so that from the program
> we can get accurate and fast stacks. That's especially important for user
> space stacks. No clear idea how to do it yet, but it would be really useful.

Yeah, and that already works in perf, its just that it doesn't work from some
points (PERF_TYPE_SOFTWARE, PERF_TYPE_TRACEPOINT, etc), as described in the
changeset I mentioned.

'perf trace --call-graph lbr' doesn't work right now even with it
interested only in the user space bits, i.e. setting
perf_event_attr.exclude_callchain_kernel.

  # perf trace --call-graph dwarf

works, but that, as you mention, really isn't "fast" and sometimes not
accurate, or at least wasn't with broken toolchains.

Example of mixed strace-like with userspace-only DWARF callchains (would be
lovely if this was with LBR, huh?) plus fp callchains for the
sched:sched_switch tracepoint plus LBR callchains for a hardware event, cycles,
look further below for the reason of the broken timestamps for
PERF_TYPE_HARDWARE events:

  # perf trace -e nanosleep --event sched:sched_switch/call-graph=fp/ --ev cycles/call-graph=lbr,period=100/ usleep 1
18446744073709.551 (         ): cycles/call-graph=lbr,period=100/:)
                                       __intel_pmu_enable_all+0xfe200080 ([kernel.kallsyms])
                                       intel_pmu_enable_all+0xfe200010 ([kernel.kallsyms])
                                       x86_pmu_enable+0xfe200271 ([kernel.kallsyms])
                                       perf_pmu_enable.part.81+0xfe200007 ([kernel.kallsyms])
                                       ctx_resched+0xfe20007a ([kernel.kallsyms])
                                       perf_event_exec+0xfe20011d ([kernel.kallsyms])
                                       setup_new_exec+0xfe20006f ([kernel.kallsyms])
                                       load_elf_binary+0xfe2003e3 ([kernel.kallsyms])
                                       search_binary_handler+0xfe20009e ([kernel.kallsyms])
                                       do_execveat_common.isra.38+0xfe20052c ([kernel.kallsyms])
                                       sys_execve+0xfe20003a ([kernel.kallsyms])
                                       do_syscall_64+0xfe200062 ([kernel.kallsyms])
                                       return_from_SYSCALL_64+0xfe200000 ([kernel.kallsyms])
                                       [0] ([unknown])
     0.310 ( 0.006 ms): usleep/20951 nanosleep(rqtp: 0x7ffda8904500       ) ...
     0.310 (         ): sched:sched_switch:usleep:20951 [120] S ==> swapper/3:0 [120])
                                       __schedule+0xfe200402 ([kernel.kallsyms])
                                       schedule+0xfe200035 ([kernel.kallsyms])
                                       do_nanosleep+0xfe20006f ([kernel.kallsyms])
                                       hrtimer_nanosleep+0xfe2000dc ([kernel.kallsyms])
                                       sys_nanosleep+0xfe20007a ([kernel.kallsyms])
                                       do_syscall_64+0xfe200062 ([kernel.kallsyms])
                                       return_from_SYSCALL_64+0xfe200000 ([kernel.kallsyms])
                                       __nanosleep+0xffff00bfad62c010 (/usr/lib64/libc-2.22.so)
18446679523046.461 (         ): cycles/call-graph=lbr,period=100/:)
                                       perf_pmu_enable.part.81+0xfe200007 ([kernel.kallsyms])
                                       __perf_event_task_sched_in+0xfe2001ad ([kernel.kallsyms])
                                       finish_task_switch+0xfe200156 ([kernel.kallsyms])
                                       __schedule+0xfe200397 ([kernel.kallsyms])
                                       schedule+0xfe200035 ([kernel.kallsyms])
                                       do_nanosleep+0xfe20006f ([kernel.kallsyms])
                                       hrtimer_nanosleep+0xfe2000dc ([kernel.kallsyms])
                                       sys_nanosleep+0xfe20007a ([kernel.kallsyms])
                                       do_syscall_64+0xfe200062 ([kernel.kallsyms])
                                       return_from_SYSCALL_64+0xfe200000 ([kernel.kallsyms])
                                       [0] ([unknown])
     0.377 ( 0.073 ms): usleep/20951  ... [continued]: nanosleep()) = 0
  [root@jouet ~]# 


perf_event_attr:
  type                             0 (PERF_TYPE_HARDWARE)
  config                           0 (PERF_COUNT_HW_CPU_CYCLES)
  size                             112
  { sample_period, sample_freq }   100
  sample_type                      IP|TID|CALLCHAIN|BRANCH_STACK|IDENTIFIER
  read_format                      ID
  disabled                         1
  inherit                          1
  enable_on_exec                   1
  sample_id_all                    1
  exclude_guest                    1
  { wakeup_events, wakeup_watermark } 1
  branch_sample_type               USER|CALL_STACK|NO_FLAGS|NO_CYCLES

missing PERF_SAMPLE_TIME, will fix.

- Arnaldo

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: LBR callchains from tracepoints
  2016-04-26 18:20             ` Arnaldo Carvalho de Melo
@ 2016-04-26 19:07               ` Peter Zijlstra
  0 siblings, 0 replies; 5+ messages in thread
From: Peter Zijlstra @ 2016-04-26 19:07 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Alexei Starovoitov, Stephane Eranian, David Ahern, Milian Wolff,
	Frédéric Weisbecker, Ingo Molnar, Namhyung Kim,
	Linux Kernel Mailing List

On Tue, Apr 26, 2016 at 03:20:32PM -0300, Arnaldo Carvalho de Melo wrote:
> Yeah, and that already works in perf, its just that it doesn't work from some
> points (PERF_TYPE_SOFTWARE, PERF_TYPE_TRACEPOINT, etc), as described in the
> changeset I mentioned.

Look at it the other way around, it _only_ works for intel cpu events,
_nothing_ else.

There is only a single PMU that supports LBR callgraph thingies, all the
others do not.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2016-04-26 19:07 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <loom.20160419T015612-712@post.gmane.org>
     [not found] ` <57157965.4020506@gmail.com>
     [not found]   ` <1704384.MQzPvE4Oa5@milian-kdab2>
     [not found]     ` <20160426010358.GD16708@kernel.org>
2016-04-26  1:24       ` LBR callchains from tracepoints Arnaldo Carvalho de Melo
2016-04-26 16:38         ` Peter Zijlstra
2016-04-26 17:26           ` Alexei Starovoitov
2016-04-26 18:20             ` Arnaldo Carvalho de Melo
2016-04-26 19:07               ` Peter Zijlstra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox