From: Alexei Starovoitov <ast@fb.com>
To: "David S. Miller" <davem@davemloft.net>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>,
Ingo Molnar <mingo@kernel.org>,
Steven Rostedt <rostedt@goodmis.org>,
Wang Nan <wangnan0@huawei.com>,
Daniel Borkmann <daniel@iogearbox.net>,
Brendan Gregg <brendan.d.gregg@gmail.com>,
<netdev@vger.kernel.org>, <linux-kernel@vger.kernel.org>
Subject: [PATCH net-next 0/3] bpf_get_stackid() and stack_trace map
Date: Wed, 17 Feb 2016 19:58:56 -0800 [thread overview]
Message-ID: <1455767939-2700534-1-git-send-email-ast@fb.com> (raw)
This patch set introduces new map type to store stack traces and
corresponding bpf_get_stackid() helper.
BPF programs already can walk the stack via unrolled loop
of bpf_probe_read()s which is ok for simple analysis, but it's
not efficient and limited to <30 frames after that the programs
don't fit into MAX_BPF_STACK. With bpf_get_stackid() helper
the programs can collect up to PERF_MAX_STACK_DEPTH both
user and kernel frames.
Using stack traces as a key in a map turned out to be very useful
for generating flame graphs, off-cpu graphs, waker and chain graphs.
Patch 3 is a simplified version of 'offwaketime' tool which is
described in detail here:
http://brendangregg.com/blog/2016-02-01/linux-wakeup-offwake-profiling.html
Earlier version of this patch were using save_stack_trace() helper,
but 'unreliable' frames add to much noise and two equiavlent
stack traces produce different 'stackid's.
Using lockdep style of storing frames with MAX_STACK_TRACE_ENTRIES is
great for lockdep, but not acceptable for bpf, since the stack_trace
map needs to be freed when user Ctrl-C the tool.
The ftrace style with per_cpu(struct ftrace_stack) is great, but it's
tightly coupled with ftrace ring buffer and has the same 'unreliable'
noise. perf_event's perf_callchain() mechanism is also very efficient
and it only needed minor generalization which is done in patch 1
to be used by bpf stack_trace maps.
Peter, please take a look at patch 1.
If you're ok with it, I'd like to take the whole set via net-next.
Patch 1 - generalization of perf_callchain()
Patch 2 - stack_trace map done as lock-less hashtable without link list
to avoid spinlock on insertion which is critical path when
bpf_get_stackid() helper is called for every task switch event
Patch 3 - offwaketime example
After the patch the 'perf report' for artificial 'sched_bench'
benchmark that doing pthread_cond_wait/signal and 'offwaketime'
example is running in the background:
16.35% swapper [kernel.vmlinux] [k] intel_idle
2.18% sched_bench [kernel.vmlinux] [k] __switch_to
2.18% sched_bench libpthread-2.12.so [.] pthread_cond_signal@@GLIBC_2.3.2
1.72% sched_bench libpthread-2.12.so [.] pthread_mutex_unlock
1.53% sched_bench [kernel.vmlinux] [k] bpf_get_stackid
1.44% sched_bench [kernel.vmlinux] [k] entry_SYSCALL_64
1.39% sched_bench [kernel.vmlinux] [k] __call_rcu.constprop.73
1.13% sched_bench libpthread-2.12.so [.] pthread_mutex_lock
1.07% sched_bench libpthread-2.12.so [.] pthread_cond_wait@@GLIBC_2.3.2
1.07% sched_bench [kernel.vmlinux] [k] hash_futex
1.05% sched_bench [kernel.vmlinux] [k] do_futex
1.05% sched_bench [kernel.vmlinux] [k] get_futex_key_refs.isra.13
The hotest part of bpf_get_stackid() is inlined jhash2, so we may consider
using some faster hash in the future, but it's good enough for now.
Alexei Starovoitov (3):
perf: generalize perf_callchain
bpf: introduce BPF_MAP_TYPE_STACK_TRACE
samples/bpf: offwaketime example
arch/x86/include/asm/stacktrace.h | 2 +-
arch/x86/kernel/cpu/perf_event.c | 4 +-
arch/x86/kernel/dumpstack.c | 6 +-
arch/x86/kernel/stacktrace.c | 18 +--
arch/x86/oprofile/backtrace.c | 3 +-
include/linux/bpf.h | 1 +
include/linux/perf_event.h | 13 ++-
include/uapi/linux/bpf.h | 21 ++++
kernel/bpf/Makefile | 3 +
kernel/bpf/stackmap.c | 237 ++++++++++++++++++++++++++++++++++++++
kernel/bpf/verifier.c | 6 +-
kernel/events/callchain.c | 32 +++--
kernel/events/internal.h | 2 -
kernel/trace/bpf_trace.c | 2 +
samples/bpf/Makefile | 4 +
samples/bpf/bpf_helpers.h | 2 +
samples/bpf/offwaketime_kern.c | 131 +++++++++++++++++++++
samples/bpf/offwaketime_user.c | 185 +++++++++++++++++++++++++++++
18 files changed, 642 insertions(+), 30 deletions(-)
create mode 100644 kernel/bpf/stackmap.c
create mode 100644 samples/bpf/offwaketime_kern.c
create mode 100644 samples/bpf/offwaketime_user.c
--
2.4.6
next reply other threads:[~2016-02-18 3:59 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-02-18 3:58 Alexei Starovoitov [this message]
2016-02-18 3:58 ` [PATCH net-next 1/3] perf: generalize perf_callchain Alexei Starovoitov
2016-02-25 14:18 ` Peter Zijlstra
2016-02-25 16:37 ` Alexei Starovoitov
2016-02-25 16:45 ` Peter Zijlstra
2016-02-25 16:48 ` Peter Zijlstra
2016-02-25 16:47 ` Peter Zijlstra
2016-02-25 17:27 ` Alexei Starovoitov
2016-02-18 3:58 ` [PATCH net-next 2/3] bpf: introduce BPF_MAP_TYPE_STACK_TRACE Alexei Starovoitov
2016-02-25 14:23 ` Peter Zijlstra
2016-02-25 16:42 ` Alexei Starovoitov
2016-02-25 16:50 ` Peter Zijlstra
2016-02-18 3:58 ` [PATCH net-next 3/3] samples/bpf: offwaketime example Alexei Starovoitov
2016-02-20 5:25 ` [PATCH net-next 0/3] bpf_get_stackid() and stack_trace map David Miller
2016-02-25 14:24 ` Peter Zijlstra
2016-02-25 16:44 ` David Miller
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1455767939-2700534-1-git-send-email-ast@fb.com \
--to=ast@fb.com \
--cc=a.p.zijlstra@chello.nl \
--cc=brendan.d.gregg@gmail.com \
--cc=daniel@iogearbox.net \
--cc=davem@davemloft.net \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@kernel.org \
--cc=netdev@vger.kernel.org \
--cc=rostedt@goodmis.org \
--cc=wangnan0@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).