From: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
To: Alexei Starovoitov <ast@plumgrid.com>
Cc: Ingo Molnar <mingo@kernel.org>,
Steven Rostedt <rostedt@goodmis.org>,
Peter Zijlstra <a.p.zijlstra@chello.nl>,
"H. Peter Anvin" <hpa@zytor.com>,
Thomas Gleixner <tglx@linutronix.de>,
Tom Zanussi <tom.zanussi@linux.intel.com>,
Jovi Zhangwei <jovi.zhangwei@gmail.com>,
Eric Dumazet <edumazet@google.com>,
linux-kernel@vger.kernel.org
Subject: Re: [RFC PATCH tip 0/5] tracing filters with BPF
Date: Tue, 03 Dec 2013 19:34:21 +0900 [thread overview]
Message-ID: <529DB3AD.4070305@hitachi.com> (raw)
In-Reply-To: <1386044930-15149-1-git-send-email-ast@plumgrid.com>
(2013/12/03 13:28), Alexei Starovoitov wrote:
> Hi All,
>
> the following set of patches adds BPF support to trace filters.
>
> Trace filters can be written in C and allow safe read-only access to any
> kernel data structure. Like systemtap but with safety guaranteed by kernel.
>
> The user can do:
> cat bpf_program > /sys/kernel/debug/tracing/.../filter
> if tracing event is either static or dynamic via kprobe_events.
Oh, thank you for this great work! :D
>
> The filter program may look like:
> void filter(struct bpf_context *ctx)
> {
> char devname[4] = "eth5";
> struct net_device *dev;
> struct sk_buff *skb = 0;
>
> dev = (struct net_device *)ctx->regs.si;
> if (bpf_memcmp(dev->name, devname, 4) == 0) {
> char fmt[] = "skb %p dev %p eth5\n";
> bpf_trace_printk(fmt, skb, dev, 0, 0);
> }
> }
>
> The kernel will do static analysis of bpf program to make sure that it cannot
> crash the kernel (doesn't have loops, valid memory/register accesses, etc).
> Then kernel will map bpf instructions to x86 instructions and let it
> run in the place of trace filter.
>
> To demonstrate performance I did a synthetic test:
> dev = init_net.loopback_dev;
> do_gettimeofday(&start_tv);
> for (i = 0; i < 1000000; i++) {
> struct sk_buff *skb;
> skb = netdev_alloc_skb(dev, 128);
> kfree_skb(skb);
> }
> do_gettimeofday(&end_tv);
> time = end_tv.tv_sec - start_tv.tv_sec;
> time *= USEC_PER_SEC;
> time += (long long)((long)end_tv.tv_usec - (long)start_tv.tv_usec);
>
> printk("1M skb alloc/free %lld (usecs)\n", time);
>
> no tracing
> [ 33.450966] 1M skb alloc/free 145179 (usecs)
>
> echo 1 > enable
> [ 97.186379] 1M skb alloc/free 240419 (usecs)
> (tracing slows down kfree_skb() due to event_buffer_lock/buffer_unlock_commit)
>
> echo 'name==eth5' > filter
> [ 139.644161] 1M skb alloc/free 302552 (usecs)
> (running filter_match_preds() for every skb and discarding
> event_buffer is even slower)
>
> cat bpf_prog > filter
> [ 171.150566] 1M skb alloc/free 199463 (usecs)
> (JITed bpf program is safely checking dev->name == eth5 and discarding)
>
> echo 0 > enable
> [ 258.073593] 1M skb alloc/free 144919 (usecs)
> (tracing is disabled, performance is back to original)
>
> The C program compiled into BPF and then JITed into x86 is faster than
> filter_match_preds() approach (199-145 msec vs 302-145 msec)
Great! :)
> tracing+bpf is a tool for safe read-only access to variables without recompiling
> the kernel and without affecting running programs.
Hmm, this feature and trace-event trigger actions can give us
powerful on-the-fly scripting functionality...
> BPF filters can be written manually (see tools/bpf/trace/filter_ex1.c)
> or better compiled from restricted C via GCC or LLVM
>
> Q: What is the difference between existing BPF and extended BPF?
> A:
> Existing BPF insn from uapi/linux/filter.h
> struct sock_filter {
> __u16 code; /* Actual filter code */
> __u8 jt; /* Jump true */
> __u8 jf; /* Jump false */
> __u32 k; /* Generic multiuse field */
> };
>
> Extended BPF insn from linux/bpf.h
> struct bpf_insn {
> __u8 code; /* opcode */
> __u8 a_reg:4; /* dest register*/
> __u8 x_reg:4; /* source register */
> __s16 off; /* signed offset */
> __s32 imm; /* signed immediate constant */
> };
>
> opcode encoding is the same between old BPF and extended BPF.
> Original BPF has two 32-bit registers.
> Extended BPF has ten 64-bit registers.
> That is the main difference.
>
> Old BPF was using jt/jf fields for jump-insn only.
> New BPF combines them into generic 'off' field for jump and non-jump insns.
> k==imm field has the same meaning.
Looks very interesting. :)
Thank you!
--
Masami HIRAMATSU
IT Management Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: masami.hiramatsu.pt@hitachi.com
next prev parent reply other threads:[~2013-12-03 10:34 UTC|newest]
Thread overview: 65+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-12-03 4:28 [RFC PATCH tip 0/5] tracing filters with BPF Alexei Starovoitov
2013-12-03 4:28 ` [RFC PATCH tip 1/5] Extended BPF core framework Alexei Starovoitov
2013-12-03 4:28 ` [RFC PATCH tip 2/5] Extended BPF JIT for x86-64 Alexei Starovoitov
2013-12-03 4:28 ` [RFC PATCH tip 3/5] Extended BPF (64-bit BPF) design document Alexei Starovoitov
2013-12-03 17:01 ` H. Peter Anvin
2013-12-03 19:59 ` Alexei Starovoitov
2013-12-03 20:41 ` Frank Ch. Eigler
2013-12-03 21:31 ` Alexei Starovoitov
2013-12-04 9:24 ` Ingo Molnar
2013-12-03 4:28 ` [RFC PATCH tip 4/5] use BPF in tracing filters Alexei Starovoitov
2013-12-04 0:48 ` Masami Hiramatsu
2013-12-04 1:11 ` Steven Rostedt
2013-12-05 0:05 ` Masami Hiramatsu
2013-12-05 5:11 ` Alexei Starovoitov
2013-12-06 8:43 ` Masami Hiramatsu
2013-12-06 10:05 ` Jovi Zhangwei
2013-12-06 23:48 ` Masami Hiramatsu
2013-12-08 18:22 ` Frank Ch. Eigler
2013-12-09 10:12 ` Masami Hiramatsu
2013-12-03 4:28 ` [RFC PATCH tip 5/5] tracing filter examples in BPF Alexei Starovoitov
2013-12-04 0:35 ` Jonathan Corbet
2013-12-04 1:21 ` Alexei Starovoitov
2013-12-03 9:16 ` [RFC PATCH tip 0/5] tracing filters with BPF Ingo Molnar
2013-12-03 15:33 ` Steven Rostedt
2013-12-03 18:26 ` Alexei Starovoitov
2013-12-04 1:13 ` Masami Hiramatsu
2013-12-09 7:29 ` Namhyung Kim
2013-12-09 9:51 ` Masami Hiramatsu
2013-12-03 18:06 ` Alexei Starovoitov
2013-12-04 9:34 ` Ingo Molnar
2013-12-04 17:36 ` Alexei Starovoitov
2013-12-05 10:38 ` Ingo Molnar
2013-12-06 5:43 ` Alexei Starovoitov
2013-12-03 10:34 ` Masami Hiramatsu [this message]
2013-12-04 0:01 ` Andi Kleen
2013-12-04 3:09 ` Alexei Starovoitov
2013-12-05 4:40 ` Alexei Starovoitov
2013-12-05 10:41 ` Ingo Molnar
2013-12-05 13:46 ` Steven Rostedt
2013-12-05 22:36 ` Alexei Starovoitov
2013-12-05 23:37 ` Steven Rostedt
2013-12-06 4:49 ` Alexei Starovoitov
2013-12-10 15:47 ` Ingo Molnar
2013-12-11 2:32 ` Alexei Starovoitov
2013-12-11 3:35 ` Masami Hiramatsu
2013-12-12 2:48 ` Alexei Starovoitov
2013-12-05 16:11 ` Frank Ch. Eigler
2013-12-05 19:43 ` Alexei Starovoitov
2013-12-06 0:14 ` Andi Kleen
2013-12-06 1:10 ` H. Peter Anvin
2013-12-06 1:20 ` Andi Kleen
2013-12-06 1:28 ` H. Peter Anvin
2013-12-06 21:43 ` Frank Ch. Eigler
2013-12-06 5:16 ` Alexei Starovoitov
2013-12-06 23:54 ` Masami Hiramatsu
2013-12-07 1:01 ` Alexei Starovoitov
2013-12-06 5:46 ` Jovi Zhangwei
2013-12-07 1:12 ` Alexei Starovoitov
2013-12-07 16:53 ` Jovi Zhangwei
2013-12-06 5:19 ` Jovi Zhangwei
2013-12-06 23:58 ` Masami Hiramatsu
2013-12-07 16:21 ` Jovi Zhangwei
2013-12-09 4:59 ` Masami Hiramatsu
2013-12-06 6:17 ` Jovi Zhangwei
2013-12-05 16:31 ` Frank Ch. Eigler
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=529DB3AD.4070305@hitachi.com \
--to=masami.hiramatsu.pt@hitachi.com \
--cc=a.p.zijlstra@chello.nl \
--cc=ast@plumgrid.com \
--cc=edumazet@google.com \
--cc=hpa@zytor.com \
--cc=jovi.zhangwei@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@kernel.org \
--cc=rostedt@goodmis.org \
--cc=tglx@linutronix.de \
--cc=tom.zanussi@linux.intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox