public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
To: Alexei Starovoitov <ast@plumgrid.com>
Cc: Ingo Molnar <mingo@kernel.org>,
	Steven Rostedt <rostedt@goodmis.org>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>,
	"H. Peter Anvin" <hpa@zytor.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Tom Zanussi <tom.zanussi@linux.intel.com>,
	Jovi Zhangwei <jovi.zhangwei@gmail.com>,
	Eric Dumazet <edumazet@google.com>,
	linux-kernel@vger.kernel.org
Subject: Re: [RFC PATCH tip 0/5] tracing filters with BPF
Date: Tue, 03 Dec 2013 19:34:21 +0900	[thread overview]
Message-ID: <529DB3AD.4070305@hitachi.com> (raw)
In-Reply-To: <1386044930-15149-1-git-send-email-ast@plumgrid.com>

(2013/12/03 13:28), Alexei Starovoitov wrote:
> Hi All,
> 
> the following set of patches adds BPF support to trace filters.
> 
> Trace filters can be written in C and allow safe read-only access to any
> kernel data structure. Like systemtap but with safety guaranteed by kernel.
> 
> The user can do:
> cat bpf_program > /sys/kernel/debug/tracing/.../filter
> if tracing event is either static or dynamic via kprobe_events.

Oh, thank you for this great work! :D

> 
> The filter program may look like:
> void filter(struct bpf_context *ctx)
> {
>         char devname[4] = "eth5";
>         struct net_device *dev;
>         struct sk_buff *skb = 0;
> 
>         dev = (struct net_device *)ctx->regs.si;
>         if (bpf_memcmp(dev->name, devname, 4) == 0) {
>                 char fmt[] = "skb %p dev %p eth5\n";
>                 bpf_trace_printk(fmt, skb, dev, 0, 0);
>         }
> }
> 
> The kernel will do static analysis of bpf program to make sure that it cannot
> crash the kernel (doesn't have loops, valid memory/register accesses, etc).
> Then kernel will map bpf instructions to x86 instructions and let it
> run in the place of trace filter.
> 
> To demonstrate performance I did a synthetic test:
>         dev = init_net.loopback_dev;
>         do_gettimeofday(&start_tv);
>         for (i = 0; i < 1000000; i++) {
>                 struct sk_buff *skb;
>                 skb = netdev_alloc_skb(dev, 128);
>                 kfree_skb(skb);
>         }
>         do_gettimeofday(&end_tv);
>         time = end_tv.tv_sec - start_tv.tv_sec;
>         time *= USEC_PER_SEC;
>         time += (long long)((long)end_tv.tv_usec - (long)start_tv.tv_usec);
> 
>         printk("1M skb alloc/free %lld (usecs)\n", time);
> 
> no tracing
> [   33.450966] 1M skb alloc/free 145179 (usecs)
> 
> echo 1 > enable
> [   97.186379] 1M skb alloc/free 240419 (usecs)
> (tracing slows down kfree_skb() due to event_buffer_lock/buffer_unlock_commit)
> 
> echo 'name==eth5' > filter
> [  139.644161] 1M skb alloc/free 302552 (usecs)
> (running filter_match_preds() for every skb and discarding
> event_buffer is even slower)
> 
> cat bpf_prog > filter
> [  171.150566] 1M skb alloc/free 199463 (usecs)
> (JITed bpf program is safely checking dev->name == eth5 and discarding)
> 
> echo 0 > enable
> [  258.073593] 1M skb alloc/free 144919 (usecs)
> (tracing is disabled, performance is back to original)
> 
> The C program compiled into BPF and then JITed into x86 is faster than
> filter_match_preds() approach (199-145 msec vs 302-145 msec)

Great! :)

> tracing+bpf is a tool for safe read-only access to variables without recompiling
> the kernel and without affecting running programs.

Hmm, this feature and trace-event trigger actions can give us
powerful on-the-fly scripting functionality...

> BPF filters can be written manually (see tools/bpf/trace/filter_ex1.c)
> or better compiled from restricted C via GCC or LLVM
> 
> Q: What is the difference between existing BPF and extended BPF?
> A:
> Existing BPF insn from uapi/linux/filter.h
> struct sock_filter {
>         __u16   code;   /* Actual filter code */
>         __u8    jt;     /* Jump true */
>         __u8    jf;     /* Jump false */
>         __u32   k;      /* Generic multiuse field */
> };
> 
> Extended BPF insn from linux/bpf.h
> struct bpf_insn {
>         __u8    code;    /* opcode */
>         __u8    a_reg:4; /* dest register*/
>         __u8    x_reg:4; /* source register */
>         __s16   off;     /* signed offset */
>         __s32   imm;     /* signed immediate constant */
> };
> 
> opcode encoding is the same between old BPF and extended BPF.
> Original BPF has two 32-bit registers.
> Extended BPF has ten 64-bit registers.
> That is the main difference.
> 
> Old BPF was using jt/jf fields for jump-insn only.
> New BPF combines them into generic 'off' field for jump and non-jump insns.
> k==imm field has the same meaning.

Looks very interesting. :)

Thank you!

-- 
Masami HIRAMATSU
IT Management Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: masami.hiramatsu.pt@hitachi.com



  parent reply	other threads:[~2013-12-03 10:34 UTC|newest]

Thread overview: 65+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-12-03  4:28 [RFC PATCH tip 0/5] tracing filters with BPF Alexei Starovoitov
2013-12-03  4:28 ` [RFC PATCH tip 1/5] Extended BPF core framework Alexei Starovoitov
2013-12-03  4:28 ` [RFC PATCH tip 2/5] Extended BPF JIT for x86-64 Alexei Starovoitov
2013-12-03  4:28 ` [RFC PATCH tip 3/5] Extended BPF (64-bit BPF) design document Alexei Starovoitov
2013-12-03 17:01   ` H. Peter Anvin
2013-12-03 19:59     ` Alexei Starovoitov
2013-12-03 20:41       ` Frank Ch. Eigler
2013-12-03 21:31         ` Alexei Starovoitov
2013-12-04  9:24           ` Ingo Molnar
2013-12-03  4:28 ` [RFC PATCH tip 4/5] use BPF in tracing filters Alexei Starovoitov
2013-12-04  0:48   ` Masami Hiramatsu
2013-12-04  1:11     ` Steven Rostedt
2013-12-05  0:05       ` Masami Hiramatsu
2013-12-05  5:11         ` Alexei Starovoitov
2013-12-06  8:43           ` Masami Hiramatsu
2013-12-06 10:05             ` Jovi Zhangwei
2013-12-06 23:48               ` Masami Hiramatsu
2013-12-08 18:22                 ` Frank Ch. Eigler
2013-12-09 10:12                   ` Masami Hiramatsu
2013-12-03  4:28 ` [RFC PATCH tip 5/5] tracing filter examples in BPF Alexei Starovoitov
2013-12-04  0:35   ` Jonathan Corbet
2013-12-04  1:21     ` Alexei Starovoitov
2013-12-03  9:16 ` [RFC PATCH tip 0/5] tracing filters with BPF Ingo Molnar
2013-12-03 15:33   ` Steven Rostedt
2013-12-03 18:26     ` Alexei Starovoitov
2013-12-04  1:13       ` Masami Hiramatsu
2013-12-09  7:29         ` Namhyung Kim
2013-12-09  9:51           ` Masami Hiramatsu
2013-12-03 18:06   ` Alexei Starovoitov
2013-12-04  9:34     ` Ingo Molnar
2013-12-04 17:36       ` Alexei Starovoitov
2013-12-05 10:38         ` Ingo Molnar
2013-12-06  5:43           ` Alexei Starovoitov
2013-12-03 10:34 ` Masami Hiramatsu [this message]
2013-12-04  0:01 ` Andi Kleen
2013-12-04  3:09   ` Alexei Starovoitov
2013-12-05  4:40     ` Alexei Starovoitov
2013-12-05 10:41       ` Ingo Molnar
2013-12-05 13:46         ` Steven Rostedt
2013-12-05 22:36           ` Alexei Starovoitov
2013-12-05 23:37             ` Steven Rostedt
2013-12-06  4:49               ` Alexei Starovoitov
2013-12-10 15:47                 ` Ingo Molnar
2013-12-11  2:32                   ` Alexei Starovoitov
2013-12-11  3:35                     ` Masami Hiramatsu
2013-12-12  2:48                       ` Alexei Starovoitov
2013-12-05 16:11       ` Frank Ch. Eigler
2013-12-05 19:43         ` Alexei Starovoitov
2013-12-06  0:14       ` Andi Kleen
2013-12-06  1:10         ` H. Peter Anvin
2013-12-06  1:20           ` Andi Kleen
2013-12-06  1:28             ` H. Peter Anvin
2013-12-06 21:43               ` Frank Ch. Eigler
2013-12-06  5:16             ` Alexei Starovoitov
2013-12-06 23:54               ` Masami Hiramatsu
2013-12-07  1:01                 ` Alexei Starovoitov
2013-12-06  5:46             ` Jovi Zhangwei
2013-12-07  1:12             ` Alexei Starovoitov
2013-12-07 16:53               ` Jovi Zhangwei
2013-12-06  5:19       ` Jovi Zhangwei
2013-12-06 23:58         ` Masami Hiramatsu
2013-12-07 16:21           ` Jovi Zhangwei
2013-12-09  4:59             ` Masami Hiramatsu
2013-12-06  6:17       ` Jovi Zhangwei
2013-12-05 16:31   ` Frank Ch. Eigler

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=529DB3AD.4070305@hitachi.com \
    --to=masami.hiramatsu.pt@hitachi.com \
    --cc=a.p.zijlstra@chello.nl \
    --cc=ast@plumgrid.com \
    --cc=edumazet@google.com \
    --cc=hpa@zytor.com \
    --cc=jovi.zhangwei@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    --cc=tom.zanussi@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox