Re: [PATCH v5 net-next 1/3] filter: add Extended BPF interpreter and converter

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Daniel Borkmann <dborkman@redhat.com>
To: Alexei Starovoitov <ast@plumgrid.com>
Cc: "David S. Miller" <davem@davemloft.net>,
	Ingo Molnar <mingo@kernel.org>, Will Drewry <wad@chromium.org>,
	Steven Rostedt <rostedt@goodmis.org>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>,
	"H. Peter Anvin" <hpa@zytor.com>,
	Hagen Paul Pfeifer <hagen@jauu.net>,
	Jesse Gross <jesse@nicira.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>,
	Tom Zanussi <tom.zanussi@linux.intel.com>,
	Jovi Zhangwei <jovi.zhangwei@gmail.com>,
	Eric Dumazet <edumazet@google.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Frederic Weisbecker <fweisbec@gmail.com>,
	Arnaldo Carvalho de Melo <acme@infradead.org>,
	Pekka Enberg <penberg@iki.fi>,
	Arjan van de Ven <arjan@infradead.org>,
	Christoph Hellwig <hch@infradead.org>,
	linux-kernel@vger.kernel.org, netdev@vger.kernel.org
Subject: Re: [PATCH v5 net-next 1/3] filter: add Extended BPF interpreter and converter
Date: Wed, 05 Mar 2014 10:24:28 +0100	[thread overview]
Message-ID: <5316ED4C.9080402@redhat.com> (raw)
In-Reply-To: <1393971437-4129-2-git-send-email-ast@plumgrid.com>

On 03/04/2014 11:17 PM, Alexei Starovoitov wrote:
> Extended BPF extends old BPF in the following ways:
> - from 2 to 10 registers
>    Original BPF has two registers (A and X) and hidden frame pointer.
>    Extended BPF has ten registers and read-only frame pointer.
> - from 32-bit registers to 64-bit registers
>    semantics of old 32-bit ALU operations are preserved via 32-bit
>    subregisters
> - if (cond) jump_true; else jump_false;
>    old BPF insns are replaced with:
>    if (cond) jump_true; /* else fallthrough */
> - adds signed > and >= insns
> - 16 4-byte stack slots for register spill-fill replaced with
>    up to 512 bytes of multi-use stack space
> - introduces bpf_call insn and register passing convention for zero
>    overhead calls from/to other kernel functions (not part of this patch)
> - adds arithmetic right shift insn
> - adds swab32/swab64 insns
> - adds atomic_add insn
> - old tax/txa insns are replaced with 'mov dst,src' insn
>
> Extended BPF is designed to be JITed with one to one mapping, which
> allows GCC/LLVM backends to generate optimized BPF code that performs
> almost as fast as natively compiled code
>
> sk_convert_filter() remaps old style insns into extended:
> 'sock_filter' instructions are remapped on the fly to
> 'sock_filter_ext' extended instructions when
> sysctl net.core.bpf_ext_enable=1
>
> Old filter comes through sk_attach_filter() or sk_unattached_filter_create()
>   if (bpf_ext_enable) {
>      convert to new
>      sk_chk_filter() - check old bpf
>      use sk_run_filter_ext() - new interpreter
>   } else {
>      sk_chk_filter() - check old bpf
>      if (bpf_jit_enable)
>          use old jit
>      else
>          use sk_run_filter() - old interpreter
>   }
>
> sk_run_filter_ext() interpreter is noticeably faster
> than sk_run_filter() for two reasons:
>
> 1.fall-through jumps
>    Old BPF jump instructions are forced to go either 'true' or 'false'
>    branch which causes branch-miss penalty.
>    Extended BPF jump instructions have one branch and fall-through,
>    which fit CPU branch predictor logic better.
>    'perf stat' shows drastic difference for branch-misses.
>
> 2.jump-threaded implementation of interpreter vs switch statement
>    Instead of single tablejump at the top of 'switch' statement, GCC will
>    generate multiple tablejump instructions, which helps CPU branch predictor
>
> Performance of two BPF filters generated by libpcap was measured
> on x86_64, i386 and arm32.
>
> fprog #1 is taken from Documentation/networking/filter.txt:
> tcpdump -i eth0 port 22 -dd
>
> fprog #2 is taken from 'man tcpdump':
> tcpdump -i eth0 'tcp port 22 and (((ip[2:2] - ((ip[0]&0xf)<<2)) -
>     ((tcp[12]&0xf0)>>2)) != 0)' -dd
>
> Other libpcap programs have similar performance differences.
>
> Raw performance data from BPF micro-benchmark:
> SK_RUN_FILTER on same SKB (cache-hit) or 10k SKBs (cache-miss)
> time in nsec per call, smaller is better
> --x86_64--
>           fprog #1  fprog #1   fprog #2  fprog #2
>           cache-hit cache-miss cache-hit cache-miss
> old BPF     90       101       192       202
> ext BPF     31        71       47         97
> old BPF jit 12        34       17         44
> ext BPF jit TBD
>
> --i386--
>           fprog #1  fprog #1   fprog #2  fprog #2
>           cache-hit cache-miss cache-hit cache-miss
> old BPF    107        136      227       252
> ext BPF     40        119       69       172
>
> --arm32--
>           fprog #1  fprog #1   fprog #2  fprog #2
>           cache-hit cache-miss cache-hit cache-miss
> old BPF    202        300      475       540
> ext BPF    139        270      296       470
> old BPF jit 26        182       37       202
> new BPF jit TBD
>
> Tested with trinify BPF fuzzer
>
> Future work:
>
> 0. seccomp
>
> 1. add extended BPF JIT for x86_64
>
> 2. add inband old/new demux and extended BPF verifier, so that new programs
>     can be loaded through old sk_attach_filter() and sk_unattached_filter_create()
>     interfaces
>
> 3. tracing filters systemtap-like with extended BPF
>
> 4. OVS with extended BPF
>
> 5. nftables with extended BPF
>
> Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
> Acked-by: Hagen Paul Pfeifer <hagen@jauu.net>

 From what I can tell, looks good to me:

Reviewed-by: Daniel Borkmann <dborkman@redhat.com>

So next step would be to add selftests and then after that JIT?

...
> +#undef LOAD_IMM
> +}
> +EXPORT_SYMBOL(sk_run_filter_ext);
> +

One minor thing I noticed when I git-am'ed your patch is the newline at
the end of file, but perhaps this can be fixed up in directly patchwork.

> diff --git a/net/core/sysctl_net_core.c b/net/core/sysctl_net_core.c
> index cf9cd13509a7..e1b979312588 100644
> --- a/net/core/sysctl_net_core.c
> +++ b/net/core/sysctl_net_core.c

next prev parent reply	other threads:[~2014-03-05  9:24 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-03-04 22:17 [PATCH v5 net-next 0/3] filter: add Extended BPF interpreter and converter Alexei Starovoitov
2014-03-04 22:17 ` [PATCH v5 net-next 1/3] " Alexei Starovoitov
2014-03-05  9:24   ` Daniel Borkmann [this message]
2014-03-05 18:13     ` Alexei Starovoitov
2014-03-04 22:17 ` [PATCH v5 net-next 2/3] [RFC] seccomp: convert seccomp to use extended BPF Alexei Starovoitov
2014-03-05  3:11   ` Alexei Starovoitov
2014-03-05 21:42     ` Kees Cook
2014-03-06  2:00       ` Alexei Starovoitov
2014-03-04 22:17 ` [PATCH v5 net-next 3/3] doc: filter: add Extended BPF documentation Alexei Starovoitov
2014-03-05  9:25   ` Daniel Borkmann

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5316ED4C.9080402@redhat.com \
    --to=dborkman@redhat.com \
    --cc=a.p.zijlstra@chello.nl \
    --cc=acme@infradead.org \
    --cc=akpm@linux-foundation.org \
    --cc=arjan@infradead.org \
    --cc=ast@plumgrid.com \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=fweisbec@gmail.com \
    --cc=hagen@jauu.net \
    --cc=hch@infradead.org \
    --cc=hpa@zytor.com \
    --cc=jesse@nicira.com \
    --cc=jovi.zhangwei@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=masami.hiramatsu.pt@hitachi.com \
    --cc=mingo@kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=penberg@iki.fi \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    --cc=tom.zanussi@linux.intel.com \
    --cc=torvalds@linux-foundation.org \
    --cc=wad@chromium.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).