Re: [PATCH RFC 0/9] socket filtering using nf_tables

netfilter-devel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Pablo Neira Ayuso <pablo@netfilter.org>
To: Alexei Starovoitov <ast@plumgrid.com>
Cc: Daniel Borkmann <dborkman@redhat.com>,
	netfilter-devel@vger.kernel.org,
	"David S. Miller" <davem@davemloft.net>,
	Network Development <netdev@vger.kernel.org>,
	kaber@trash.net, Eric Dumazet <eric.dumazet@gmail.com>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH RFC 0/9] socket filtering using nf_tables
Date: Thu, 13 Mar 2014 13:29:13 +0100	[thread overview]
Message-ID: <20140313122913.GA4898@localhost> (raw)
In-Reply-To: <CAMEtUuztfWniTr+3u6ttuygvG=xYdp0ANn_w9MGqacQvHrXtWA@mail.gmail.com>

On Wed, Mar 12, 2014 at 08:29:07PM -0700, Alexei Starovoitov wrote:
> On Wed, Mar 12, 2014 at 2:15 AM, Pablo Neira Ayuso <pablo@netfilter.org> wrote:
[...]
> The patches don't explain the reasons to do nft socket filtering.

OK, some reasons from the interface point of view:

1) It provides an extensible interface to userspace. We didn't
   invented a new wheel in that regard, we just reused the
   extensibility of TLVs used in netlink as intermediate format
   between user and kernelspace, also used many other applications
   outthere. The TLV parsing and building is not new code, most that of
   that code has been exposed to userspace already through netlink.

2) It shows that, with little generalisation, we can open the door to
   one single *classification interface* for the users. Just make some
   little googling, you'll find *lots of people* barfing on the fact that
   we have that many interfaces to classify packets in Linux. And I'm
   *not* talking about the packet classification approach itself, that's a
   different debate of course.

[...]
> Simplicity and performance should be the deciding factor.
> imo nft+sock_filter example is not simple.

OK, some comments in that regard:

1) Simplicity: With the nft approach you can just use a filter
expressed in json, eg.

{"rule":
 {"expr":[
  {"type":"meta","dreg":1,"key":"protocol"},
  {"type":"cmp","sreg":1,"op":"eq","cmpdata":{"data_reg":{"type":"value","len":2,"data0":"0x00000008"}}},
  {"type":"payload","dreg":1,"offset":9,"len":1,"base":"network"},
  {"type":"cmp","sreg":1,"op":"eq","cmpdata":{"data_reg":{"type":"value","len":1,"data0":"0x00000006"}}},
  {"type":"immediate","dreg":0,"immediatedata":{"data_reg":{"type":"verdict","verdict":"0xffffffff"}}}]
 }
}

Which is still more readable and easier to maintain that a BPF
snippet. So you just pass it to the json parser in the libnftnl
library (or whatever other better minimalistic library we would ever
have) which transforms this to TLV format that you can pass to the
kernel.  The kernel will do the job to translate this.

How are users going to integrate the restricted C's eBPF code
infrastructure into their projects? I don't think that will be simple.
They will likely include the BPF snippet to avoid all the integration
burden, as it already happens in many projects with BPF filters.

2) Performance. Patrick has been doing many progress with the generic
   set infrastructure for nft. In that regard, we aim to achieve
   performance by arranging data using performance data structures,
   *jit is not the only way to boost performance* (although we also
   want to have it).

Some example:

   set type IPv4 address = { N thousands of IPv4 addresses }

   reg1 <- payload(network header, offsetof(struct iphdr, daddr), 4)
   lookup(reg1, set)
   reg0 <- immediate(0xffffffff)

Or even better, using dictionaries:

   set type IPv4 address = { 1.1.1.1 : accept, 2.2.2.2 : accept, 3.3.3.3 : drop ...}

   reg1 <- payload(network header, offsetof(struct iphdr, daddr), 4)
   reg0 <- lookup(reg1, set)

Where "accept" is an alias of 0xffffffff and "drop" is 0 in the
nft-sock case. The verdict part has been generalised so we can adapt
nft to the corresponding packet classification engine.

This part just needs a follow-up patch to add set-based filtering for
nft-sockets. See this for more information about ongoing efforts on
the nft set infrastructure: http://patchwork.ozlabs.org/patch/326368/
that will also integrate into nft-sock.

[...]
> > +struct sock_filter_ext {
> > +       __u8    code;    /* opcode */
> > +       __u8    a_reg:4; /* dest register */
> > +       __u8    x_reg:4; /* source register */
> > +       __s16   off;     /* signed offset */
> > +       __s32   imm;     /* signed immediate constant */
> > +};
> >
> > If I didn't miss anything from your patchset, that structure is again
> > exposed to userspace, so we won't be able to change it ever unless we
> > add a new binary interface.
> >
> > From the user interface perspective, our nft approach is quite
> > different, from userspace you express the filter using TLVs (like in
> > the payload of the netlink message). Then, nf_tables core
> > infrastructure parses this and transforms it to the kernel
> > representation. It's very flexible since you don't expose the internal
> > representation to userspace, which means that we can change the
> > internal layout anytime, thus, highly extensible.
> 
> bpf/ebpf is an instruction set. Arguing about fixed vs non-fixed
> insn size is like arguing x86 variable encoding vs sparc fixed.
> Both are infinitely flexible.

Right, you can extend interfaces forever with lots of patchwork and
"smart tricks" but that doesn't mean that will look nice...

As I said, I believe that having a nice extensible interface is
extremely important to make it easier for development. If we have to
rearrange the internal representation for some reason, we can do it
indeed without bothering about making translations to avoid breaking
userspace and having to use ugly tricks (just see sk_decode_filter()
or any other translation to support any new way to express a
filter...).

[...]
> Say you want to translate nft-cmp instruction into sequence of native
> comparisons. You'd need to use load from memory, compare and
> branch operations. That's ebpf!

Nope sorry, that's not ebpf. That's assembler code.

[...]
> You can view ebpf as a tool to achieve jiting of nft.
> It will save you a lot of time.

nft interface is already well-abstracted from the representation, so I
don't find a good reason to make a step backward that will force us to
represent the instructions using a fixed layout structure that is
exposed to userspace, that we won't be able to change once if this
gets into mainstream.

Probably the nft is not the easiest path, I agree, it's been not so
far if you look at the record. But with time and development hours
from everyone, I believe we'll enjoy a nice framework.

next prev parent reply	other threads:[~2014-03-13 12:29 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-03-11  9:19 [PATCH RFC 0/9] socket filtering using nf_tables Pablo Neira Ayuso
2014-03-11  9:19 ` [PATCH RFC 1/9] net: rename fp->bpf_func to fp->run_filter Pablo Neira Ayuso
2014-03-11  9:19 ` [PATCH RFC 2/9] net: filter: account filter length in bytes Pablo Neira Ayuso
2014-03-11  9:19 ` [PATCH RFC 3/9] net: filter: generalise sk_filter_release Pablo Neira Ayuso
2014-03-11  9:19 ` [PATCH RFC 4/9] netfilter: nf_tables: move fast operations to header Pablo Neira Ayuso
2014-03-11  9:19 ` [PATCH RFC 5/9] netfilter: nf_tables: add nft_value_init Pablo Neira Ayuso
2014-03-11  9:19 ` [PATCH RFC 6/9] netfilter: nf_tables: rename nf_tables_core.c to nf_tables_nf.c Pablo Neira Ayuso
2014-03-11  9:19 ` [PATCH RFC 7/9] netfilter: nf_tables: move expression infrastructure to built-in core Pablo Neira Ayuso
2014-03-11  9:19 ` [PATCH RFC 8/9] netfilter: nf_tables: generalize verdict handling and introduce scopes Pablo Neira Ayuso
2014-03-11  9:19 ` [PATCH RFC 9/9] netfilter: nf_tables: add support for socket filtering Pablo Neira Ayuso
2014-03-11 10:29 ` [PATCH RFC 0/9] socket filtering using nf_tables Daniel Borkmann
2014-03-11 17:59   ` Alexei Starovoitov
2014-03-12  9:15     ` Pablo Neira Ayuso
2014-03-12  9:27       ` Pablo Neira Ayuso
2014-03-13  3:29       ` Alexei Starovoitov
2014-03-13 12:29         ` Pablo Neira Ayuso [this message]
2014-03-14 15:28           ` Alexei Starovoitov
2014-03-14 18:16             ` Pablo Neira Ayuso
2014-03-15  4:04               ` Alexei Starovoitov
2014-03-15 19:03                 ` Pablo Neira Ayuso
2014-03-15 19:18                   ` Alexei Starovoitov
2014-03-11 12:57 ` Andi Kleen
2014-04-04 15:24 ` David Miller
2014-04-04 15:27   ` Pablo Neira Ayuso

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140313122913.GA4898@localhost \
    --to=pablo@netfilter.org \
    --cc=ast@plumgrid.com \
    --cc=davem@davemloft.net \
    --cc=dborkman@redhat.com \
    --cc=eric.dumazet@gmail.com \
    --cc=kaber@trash.net \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=netfilter-devel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).