From mboxrd@z Thu Jan  1 00:00:00 1970
From: Pablo Neira Ayuso <pablo@netfilter.org>
Subject: Re: [PATCH next] iptables: add xt_bpf match
Date: Tue, 8 Jan 2013 04:21:23 +0100
Message-ID: <20130108032123.GA16502@1984>
References: <20121208033111.GB28114@1984>
 <1355089978-24463-1-git-send-email-willemb@google.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Cc: netfilter-devel@vger.kernel.org
To: Willem de Bruijn <willemb@google.com>
Return-path: <netfilter-devel-owner@vger.kernel.org>
Received: from mail.us.es ([193.147.175.20]:57549 "EHLO mail.us.es"
	rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
	id S1750934Ab3AHDVd (ORCPT <rfc822;netfilter-devel@vger.kernel.org>);
	Mon, 7 Jan 2013 22:21:33 -0500
Content-Disposition: inline
In-Reply-To: <1355089978-24463-1-git-send-email-willemb@google.com>
Sender: netfilter-devel-owner@vger.kernel.org
List-ID: <netfilter-devel.vger.kernel.org>

Hi Willem,

On Sun, Dec 09, 2012 at 04:52:58PM -0500, Willem de Bruijn wrote:
> Support arbitrary linux socket filter (BPF) programs as iptables
> match rules. This allows for very expressive filters, and on
> platforms with BPF JIT appears competitive with traditional hardcoded
> iptables rules.
> 
> At least, on an x86_64 that achieves 40K netperf TCP_STREAM without
> any iptables rules (40 GBps),
> 
> inserting 100x this bpf rule gives 28K
> 
>     ./iptables -A OUTPUT -m bpf --bytecode '6,40 0 0 14, 21 0 3 2048,48 0 0 25,21 0 1 20,6 0 0 96,6 0 0 0,' -j
> 
>     (as generated by tcpdump -i any -ddd ip proto 20 | tr '\n' ',')
> 
> inserting 100x this u32 rule gives 21K
> 
>     ./iptables -A OUTPUT -m u32 --u32 '6&0xFF=0x20' -j DROP
> 
> The two are logically equivalent, as far as I can tell. Let me know
> if my test methodology is flawed in some way. Even in cases where
> slower, the filter adds functionality currently lacking in iptables,
> such as access to sk_buff fields like rxhash and queue_mapping.
> 
> Signed-off-by: Willem de Bruijn <willemb@google.com>
> ---
>  include/linux/netfilter/xt_bpf.h |   17 +++++++
>  net/netfilter/Kconfig            |    9 ++++
>  net/netfilter/Makefile           |    1 +
>  net/netfilter/x_tables.c         |    5 +-
>  net/netfilter/xt_bpf.c           |   86 ++++++++++++++++++++++++++++++++++++++
>  5 files changed, 116 insertions(+), 2 deletions(-)
>  create mode 100644 include/linux/netfilter/xt_bpf.h
>  create mode 100644 net/netfilter/xt_bpf.c
> 
> diff --git a/include/linux/netfilter/xt_bpf.h b/include/linux/netfilter/xt_bpf.h
> new file mode 100644
> index 0000000..23502c0
> --- /dev/null
> +++ b/include/linux/netfilter/xt_bpf.h
> @@ -0,0 +1,17 @@
> +#ifndef _XT_BPF_H
> +#define _XT_BPF_H
> +
> +#include <linux/filter.h>
> +#include <linux/types.h>
> +
> +struct xt_bpf_info {
> +	__u16 bpf_program_num_elem;
> +
> +	/* only used in kernel */
> +	struct sk_filter *filter __attribute__((aligned(8)));

I see. You set match->userspacesize to zero in libxt_bpf to skip the
comparison of that internal struct sk_filter *filter.

> +
> +	/* variable size, based on program_num_elem */
> +	struct sock_filter bpf_program[0];

While testing this I noticed:

iptables -I OUTPUT -m bpf --bytecode   \
        '6,40 0 0 14, 21 0 3 2048,48 0 0 25,21 0 1 20,6 0 0 96,6 0 0 0' -j ACCEPT

Note that this works but it should not.

iptables -D OUTPUT -m bpf --bytecode   \
        '6,40 0 0 14, 21 0 3 2048,48 0 0 25,21 0 1 20,6 0 0 96,1 0 0 0' -j ACCEPT
                                                               ^
Mind that 1, it's a different filter, but it deletes the previous
filter without problems here.

A quick look at make_delete_mask() in iptables tells me that the
changes you made to userspace to allow variable size matches are not
enough to generate a sane mask (which is fundamental while looking for
a matching rule during the deletion).