Re: [RFC PATCH tip 0/5] tracing filters with BPF

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
To: Alexei Starovoitov <ast@plumgrid.com>
Cc: Ingo Molnar <mingo@kernel.org>,
	Steven Rostedt <rostedt@goodmis.org>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>,
	"H. Peter Anvin" <hpa@zytor.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Tom Zanussi <tom.zanussi@linux.intel.com>,
	Jovi Zhangwei <jovi.zhangwei@gmail.com>,
	Eric Dumazet <edumazet@google.com>,
	linux-kernel@vger.kernel.org
Subject: Re: [RFC PATCH tip 0/5] tracing filters with BPF
Date: Tue, 03 Dec 2013 19:34:21 +0900	[thread overview]
Message-ID: <529DB3AD.4070305@hitachi.com> (raw)
In-Reply-To: <1386044930-15149-1-git-send-email-ast@plumgrid.com>

(2013/12/03 13:28), Alexei Starovoitov wrote:
> Hi All,
> 
> the following set of patches adds BPF support to trace filters.
> 
> Trace filters can be written in C and allow safe read-only access to any
> kernel data structure. Like systemtap but with safety guaranteed by kernel.
> 
> The user can do:
> cat bpf_program > /sys/kernel/debug/tracing/.../filter
> if tracing event is either static or dynamic via kprobe_events.

Oh, thank you for this great work! :D

> 
> The filter program may look like:
> void filter(struct bpf_context *ctx)
> {
>         char devname[4] = "eth5";
>         struct net_device *dev;
>         struct sk_buff *skb = 0;
> 
>         dev = (struct net_device *)ctx->regs.si;
>         if (bpf_memcmp(dev->name, devname, 4) == 0) {
>                 char fmt[] = "skb %p dev %p eth5\n";
>                 bpf_trace_printk(fmt, skb, dev, 0, 0);
>         }
> }
> 
> The kernel will do static analysis of bpf program to make sure that it cannot
> crash the kernel (doesn't have loops, valid memory/register accesses, etc).
> Then kernel will map bpf instructions to x86 instructions and let it
> run in the place of trace filter.
> 
> To demonstrate performance I did a synthetic test:
>         dev = init_net.loopback_dev;
>         do_gettimeofday(&start_tv);
>         for (i = 0; i < 1000000; i++) {
>                 struct sk_buff *skb;
>                 skb = netdev_alloc_skb(dev, 128);
>                 kfree_skb(skb);
>         }
>         do_gettimeofday(&end_tv);
>         time = end_tv.tv_sec - start_tv.tv_sec;
>         time *= USEC_PER_SEC;
>         time += (long long)((long)end_tv.tv_usec - (long)start_tv.tv_usec);
> 
>         printk("1M skb alloc/free %lld (usecs)\n", time);
> 
> no tracing
> [   33.450966] 1M skb alloc/free 145179 (usecs)
> 
> echo 1 > enable
> [   97.186379] 1M skb alloc/free 240419 (usecs)
> (tracing slows down kfree_skb() due to event_buffer_lock/buffer_unlock_commit)
> 
> echo 'name==eth5' > filter
> [  139.644161] 1M skb alloc/free 302552 (usecs)
> (running filter_match_preds() for every skb and discarding
> event_buffer is even slower)
> 
> cat bpf_prog > filter
> [  171.150566] 1M skb alloc/free 199463 (usecs)
> (JITed bpf program is safely checking dev->name == eth5 and discarding)
> 
> echo 0 > enable
> [  258.073593] 1M skb alloc/free 144919 (usecs)
> (tracing is disabled, performance is back to original)
> 
> The C program compiled into BPF and then JITed into x86 is faster than
> filter_match_preds() approach (199-145 msec vs 302-145 msec)

Great! :)

> tracing+bpf is a tool for safe read-only access to variables without recompiling
> the kernel and without affecting running programs.

Hmm, this feature and trace-event trigger actions can give us
powerful on-the-fly scripting functionality...

> BPF filters can be written manually (see tools/bpf/trace/filter_ex1.c)
> or better compiled from restricted C via GCC or LLVM
> 
> Q: What is the difference between existing BPF and extended BPF?
> A:
> Existing BPF insn from uapi/linux/filter.h
> struct sock_filter {
>         __u16   code;   /* Actual filter code */
>         __u8    jt;     /* Jump true */
>         __u8    jf;     /* Jump false */
>         __u32   k;      /* Generic multiuse field */
> };
> 
> Extended BPF insn from linux/bpf.h
> struct bpf_insn {
>         __u8    code;    /* opcode */
>         __u8    a_reg:4; /* dest register*/
>         __u8    x_reg:4; /* source register */
>         __s16   off;     /* signed offset */
>         __s32   imm;     /* signed immediate constant */
> };
> 
> opcode encoding is the same between old BPF and extended BPF.
> Original BPF has two 32-bit registers.
> Extended BPF has ten 64-bit registers.
> That is the main difference.
> 
> Old BPF was using jt/jf fields for jump-insn only.
> New BPF combines them into generic 'off' field for jump and non-jump insns.
> k==imm field has the same meaning.

Looks very interesting. :)

Thank you!

-- 
Masami HIRAMATSU
IT Management Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: masami.hiramatsu.pt@hitachi.com

next prev parent reply	other threads:[~2013-12-03 10:34 UTC|newest]

Thread overview: 65+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-12-03  4:28 [RFC PATCH tip 0/5] tracing filters with BPF Alexei Starovoitov
2013-12-03  4:28 ` [RFC PATCH tip 1/5] Extended BPF core framework Alexei Starovoitov
2013-12-03  4:28 ` [RFC PATCH tip 2/5] Extended BPF JIT for x86-64 Alexei Starovoitov
2013-12-03  4:28 ` [RFC PATCH tip 3/5] Extended BPF (64-bit BPF) design document Alexei Starovoitov
2013-12-03 17:01   ` H. Peter Anvin
2013-12-03 19:59     ` Alexei Starovoitov
2013-12-03 20:41       ` Frank Ch. Eigler
2013-12-03 21:31         ` Alexei Starovoitov
2013-12-04  9:24           ` Ingo Molnar
2013-12-03  4:28 ` [RFC PATCH tip 4/5] use BPF in tracing filters Alexei Starovoitov
2013-12-04  0:48   ` Masami Hiramatsu
2013-12-04  1:11     ` Steven Rostedt
2013-12-05  0:05       ` Masami Hiramatsu
2013-12-05  5:11         ` Alexei Starovoitov
2013-12-06  8:43           ` Masami Hiramatsu
2013-12-06 10:05             ` Jovi Zhangwei
2013-12-06 23:48               ` Masami Hiramatsu
2013-12-08 18:22                 ` Frank Ch. Eigler
2013-12-09 10:12                   ` Masami Hiramatsu
2013-12-03  4:28 ` [RFC PATCH tip 5/5] tracing filter examples in BPF Alexei Starovoitov
2013-12-04  0:35   ` Jonathan Corbet
2013-12-04  1:21     ` Alexei Starovoitov
2013-12-03  9:16 ` [RFC PATCH tip 0/5] tracing filters with BPF Ingo Molnar
2013-12-03 15:33   ` Steven Rostedt
2013-12-03 18:26     ` Alexei Starovoitov
2013-12-04  1:13       ` Masami Hiramatsu
2013-12-09  7:29         ` Namhyung Kim
2013-12-09  9:51           ` Masami Hiramatsu
2013-12-03 18:06   ` Alexei Starovoitov
2013-12-04  9:34     ` Ingo Molnar
2013-12-04 17:36       ` Alexei Starovoitov
2013-12-05 10:38         ` Ingo Molnar
2013-12-06  5:43           ` Alexei Starovoitov
2013-12-03 10:34 ` Masami Hiramatsu [this message]
2013-12-04  0:01 ` Andi Kleen
2013-12-04  3:09   ` Alexei Starovoitov
2013-12-05  4:40     ` Alexei Starovoitov
2013-12-05 10:41       ` Ingo Molnar
2013-12-05 13:46         ` Steven Rostedt
2013-12-05 22:36           ` Alexei Starovoitov
2013-12-05 23:37             ` Steven Rostedt
2013-12-06  4:49               ` Alexei Starovoitov
2013-12-10 15:47                 ` Ingo Molnar
2013-12-11  2:32                   ` Alexei Starovoitov
2013-12-11  3:35                     ` Masami Hiramatsu
2013-12-12  2:48                       ` Alexei Starovoitov
2013-12-05 16:11       ` Frank Ch. Eigler
2013-12-05 19:43         ` Alexei Starovoitov
2013-12-06  0:14       ` Andi Kleen
2013-12-06  1:10         ` H. Peter Anvin
2013-12-06  1:20           ` Andi Kleen
2013-12-06  1:28             ` H. Peter Anvin
2013-12-06 21:43               ` Frank Ch. Eigler
2013-12-06  5:16             ` Alexei Starovoitov
2013-12-06 23:54               ` Masami Hiramatsu
2013-12-07  1:01                 ` Alexei Starovoitov
2013-12-06  5:46             ` Jovi Zhangwei
2013-12-07  1:12             ` Alexei Starovoitov
2013-12-07 16:53               ` Jovi Zhangwei
2013-12-06  5:19       ` Jovi Zhangwei
2013-12-06 23:58         ` Masami Hiramatsu
2013-12-07 16:21           ` Jovi Zhangwei
2013-12-09  4:59             ` Masami Hiramatsu
2013-12-06  6:17       ` Jovi Zhangwei
2013-12-05 16:31   ` Frank Ch. Eigler

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=529DB3AD.4070305@hitachi.com \
    --to=masami.hiramatsu.pt@hitachi.com \
    --cc=a.p.zijlstra@chello.nl \
    --cc=ast@plumgrid.com \
    --cc=edumazet@google.com \
    --cc=hpa@zytor.com \
    --cc=jovi.zhangwei@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    --cc=tom.zanussi@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.