[PATCH rfc] netfilter: two xtables matches

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH rfc] netfilter: two xtables matches
@ 2012-12-05 19:22 Willem de Bruijn
  2012-12-05 19:22 ` [PATCH 1/2] netfilter: add xt_priority xtables match Willem de Bruijn
                   ` (2 more replies)
  0 siblings, 3 replies; 19+ messages in thread
From: Willem de Bruijn @ 2012-12-05 19:22 UTC (permalink / raw)
  To: netfilter-devel, netdev, edumazet, davem, kaber, pablo

The second patch is more speculative and aims to be a more general
workaround, as well as a performance optimization: support
(preferably JIT compiled) BPF programs as iptables match rules.

Potentially, the skb->priority match can be implemented by applying
only the second patch and adding a new BPF_S_ANC ancillary field to
Linux Socket Filters.

I also wrote corresponding userspace patches to iptables. The process
for submitting both kernel and user patches is not 100% clear to me.
Sending the kernel bits to both netdev and netfilter-devel for
initial feedback. Please correct me if you want it another way.

The patches apply to net-next.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 1/2] netfilter: add xt_priority xtables match
  2012-12-05 19:22 [PATCH rfc] netfilter: two xtables matches Willem de Bruijn
@ 2012-12-05 19:22 ` Willem de Bruijn
  2012-12-05 19:22 ` [PATCH 2/2] netfilter: add xt_bpf " Willem de Bruijn
  2012-12-05 19:28 ` [PATCH rfc] netfilter: two xtables matches Willem de Bruijn
  2 siblings, 0 replies; 19+ messages in thread
From: Willem de Bruijn @ 2012-12-05 19:22 UTC (permalink / raw)
  To: netfilter-devel, netdev, edumazet, davem, kaber, pablo; +Cc: Willem de Bruijn

Add an iptables match based on the skb->priority field. This field
can be set by socket option SO_PRIORITY, among others.

The match supports range based matching on packet priority, with
optional inversion. Before matching, a mask can be applied to the
priority field to handle the case where different regions of the
bitfield are reserved for unrelated uses.
---
 include/linux/netfilter/xt_priority.h |   13 ++++++++
 net/netfilter/Kconfig                 |    9 ++++++
 net/netfilter/Makefile                |    1 +
 net/netfilter/xt_priority.c           |   51 +++++++++++++++++++++++++++++++++
 4 files changed, 74 insertions(+), 0 deletions(-)
 create mode 100644 include/linux/netfilter/xt_priority.h
 create mode 100644 net/netfilter/xt_priority.c

diff --git a/include/linux/netfilter/xt_priority.h b/include/linux/netfilter/xt_priority.h
new file mode 100644
index 0000000..da9a288
--- /dev/null
+++ b/include/linux/netfilter/xt_priority.h
@@ -0,0 +1,13 @@
+#ifndef _XT_PRIORITY_H
+#define _XT_PRIORITY_H
+
+#include <linux/types.h>
+
+struct xt_priority_info {
+	__u32 min;
+	__u32 max;
+	__u32 mask;
+	__u8  invert;
+};
+
+#endif /*_XT_PRIORITY_H */
diff --git a/net/netfilter/Kconfig b/net/netfilter/Kconfig
index fefa514..c9739c6 100644
--- a/net/netfilter/Kconfig
+++ b/net/netfilter/Kconfig
@@ -1093,6 +1093,15 @@ config NETFILTER_XT_MATCH_PKTTYPE
 
 	  To compile it as a module, choose M here.  If unsure, say N.
 
+config NETFILTER_XT_MATCH_PRIORITY
+	tristate '"priority" match support'
+	depends on NETFILTER_ADVANCED
+	help
+	  This option adds a match based on the value of the sk_buff
+	  priority field.
+
+	  To compile it as a module, choose M here.  If unsure, say N.
+
 config NETFILTER_XT_MATCH_QUOTA
 	tristate '"quota" match support'
 	depends on NETFILTER_ADVANCED
diff --git a/net/netfilter/Makefile b/net/netfilter/Makefile
index 3259697..8e5602f 100644
--- a/net/netfilter/Makefile
+++ b/net/netfilter/Makefile
@@ -124,6 +124,7 @@ obj-$(CONFIG_NETFILTER_XT_MATCH_OWNER) += xt_owner.o
 obj-$(CONFIG_NETFILTER_XT_MATCH_PHYSDEV) += xt_physdev.o
 obj-$(CONFIG_NETFILTER_XT_MATCH_PKTTYPE) += xt_pkttype.o
 obj-$(CONFIG_NETFILTER_XT_MATCH_POLICY) += xt_policy.o
+obj-$(CONFIG_NETFILTER_XT_MATCH_PRIORITY) += xt_priority.o
 obj-$(CONFIG_NETFILTER_XT_MATCH_QUOTA) += xt_quota.o
 obj-$(CONFIG_NETFILTER_XT_MATCH_RATEEST) += xt_rateest.o
 obj-$(CONFIG_NETFILTER_XT_MATCH_REALM) += xt_realm.o
diff --git a/net/netfilter/xt_priority.c b/net/netfilter/xt_priority.c
new file mode 100644
index 0000000..4982eee
--- /dev/null
+++ b/net/netfilter/xt_priority.c
@@ -0,0 +1,51 @@
+/* Xtables module to match packets based on their sk_buff priority field.
+ * Copyright 2012 Google Inc.
+ * Written by Willem de Bruijn <willemb@google.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <linux/module.h>
+#include <linux/skbuff.h>
+
+#include <linux/netfilter/xt_priority.h>
+#include <linux/netfilter/x_tables.h>
+
+MODULE_AUTHOR("Willem de Bruijn <willemb@google.com>");
+MODULE_DESCRIPTION("Xtables: priority filter match");
+MODULE_LICENSE("GPL");
+MODULE_ALIAS("ipt_priority");
+MODULE_ALIAS("ip6t_priority");
+
+static bool priority_mt(const struct sk_buff *skb,
+			struct xt_action_param *par)
+{
+	const struct xt_priority_info *info = par->matchinfo;
+
+	__u32 priority = skb->priority & info->mask;
+	return (priority >= info->min && priority <= info->max) ^ info->invert;
+}
+
+static struct xt_match priority_mt_reg __read_mostly = {
+	.name		= "priority",
+	.revision	= 0,
+	.family		= NFPROTO_UNSPEC,
+	.match		= priority_mt,
+	.matchsize	= sizeof(struct xt_priority_info),
+	.me		= THIS_MODULE,
+};
+
+static int __init priority_mt_init(void)
+{
+	return xt_register_match(&priority_mt_reg);
+}
+
+static void __exit priority_mt_exit(void)
+{
+	xt_unregister_match(&priority_mt_reg);
+}
+
+module_init(priority_mt_init);
+module_exit(priority_mt_exit);
-- 
1.7.7.3

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 2/2] netfilter: add xt_bpf xtables match
  2012-12-05 19:22 [PATCH rfc] netfilter: two xtables matches Willem de Bruijn
  2012-12-05 19:22 ` [PATCH 1/2] netfilter: add xt_priority xtables match Willem de Bruijn
@ 2012-12-05 19:22 ` Willem de Bruijn
  2012-12-05 19:48   ` Pablo Neira Ayuso
  2012-12-05 19:28 ` [PATCH rfc] netfilter: two xtables matches Willem de Bruijn
  2 siblings, 1 reply; 19+ messages in thread
From: Willem de Bruijn @ 2012-12-05 19:22 UTC (permalink / raw)
  To: netfilter-devel, netdev, edumazet, davem, kaber, pablo; +Cc: Willem de Bruijn

A new match that executes sk_run_filter on every packet. BPF filters
can access skbuff fields that are out of scope for existing iptables
rules, allow more expressive logic, and on platforms with JIT support
can even be faster.

I have a corresponding iptables patch that takes `tcpdump -ddd`
output, as used in the examples below. The two parts communicate
using a variable length structure. This is similar to ebt_among,
but new for iptables.

Verified functionality by inserting an ip source filter on chain
INPUT and an ip dest filter on chain OUTPUT and noting that ping
failed while a rule was active:

iptables -v -A INPUT -m bpf --bytecode '4,32 0 0 12,21 0 1 $SADDR,6 0 0 96,6 0 0 0,' -j DROP
iptables -v -A OUTPUT -m bpf --bytecode '4,32 0 0 16,21 0 1 $DADDR,6 0 0 96,6 0 0 0,' -j DROP

Evaluated throughput by running netperf TCP_STREAM over loopback on
x86_64. I expected the BPF filter to outperform hardcoded iptables
filters when replacing multiple matches with a single bpf match, but
even a single comparison to u32 appears to do better. Relative to the
benchmark with no filter applied, rate with 100 BPF filters dropped
to 81%. With 100 U32 filters it dropped to 55%. The difference sounds
excessive to me, but was consistent on my hardware. Commands used:

for i in `seq 100`; do iptables -A OUTPUT -m bpf --bytecode '4,48 0 0 9,21 0 1 20,6 0 0 96,6 0 0 0,' -j DROP; done
for i in `seq 3`; do netperf -t TCP_STREAM -I 99 -H localhost; done

iptables -F OUTPUT

for i in `seq 100`; do iptables -A OUTPUT -m u32 --u32 '6&0xFF=0x20' -j DROP; done
for i in `seq 3`; do netperf -t TCP_STREAM -I 99 -H localhost; done

FYI: perf top

[bpf]
    33.94%  [kernel]          [k] copy_user_generic_string
     8.92%  [kernel]          [k] sk_run_filter
     7.77%  [ip_tables]       [k] ipt_do_table

[u32]
    22.63%  [kernel]          [k] copy_user_generic_string
    14.46%  [kernel]          [k] memcpy
     9.19%  [ip_tables]       [k] ipt_do_table
     8.47%  [xt_u32]          [k] u32_mt
     5.32%  [kernel]          [k] skb_copy_bits

The big difference appears to be in memory copying. I have not
looked into u32, so cannot explain this right now. More interestingly,
at higher rate, sk_run_filter appears to use as many cycles as u32_mt
(both traces have roughly the same number of events).

One caveat: to work independent of device link layer, the filter
expects DLT_RAW style BPF programs, i.e., those that expect the
packet to start at the IP layer.
---
 include/linux/netfilter/xt_bpf.h |   17 +++++++
 net/netfilter/Kconfig            |    9 ++++
 net/netfilter/Makefile           |    1 +
 net/netfilter/x_tables.c         |    5 +-
 net/netfilter/xt_bpf.c           |   88 ++++++++++++++++++++++++++++++++++++++
 5 files changed, 118 insertions(+), 2 deletions(-)
 create mode 100644 include/linux/netfilter/xt_bpf.h
 create mode 100644 net/netfilter/xt_bpf.c

diff --git a/include/linux/netfilter/xt_bpf.h b/include/linux/netfilter/xt_bpf.h
new file mode 100644
index 0000000..23502c0
--- /dev/null
+++ b/include/linux/netfilter/xt_bpf.h
@@ -0,0 +1,17 @@
+#ifndef _XT_BPF_H
+#define _XT_BPF_H
+
+#include <linux/filter.h>
+#include <linux/types.h>
+
+struct xt_bpf_info {
+	__u16 bpf_program_num_elem;
+
+	/* only used in kernel */
+	struct sk_filter *filter __attribute__((aligned(8)));
+
+	/* variable size, based on program_num_elem */
+	struct sock_filter bpf_program[0];
+};
+
+#endif /*_XT_BPF_H */
diff --git a/net/netfilter/Kconfig b/net/netfilter/Kconfig
index c9739c6..c7cc0b8 100644
--- a/net/netfilter/Kconfig
+++ b/net/netfilter/Kconfig
@@ -798,6 +798,15 @@ config NETFILTER_XT_MATCH_ADDRTYPE
 	  If you want to compile it as a module, say M here and read
 	  <file:Documentation/kbuild/modules.txt>.  If unsure, say `N'.
 
+config NETFILTER_XT_MATCH_BPF
+	tristate '"bpf" match support'
+	depends on NETFILTER_ADVANCED
+	help
+	  BPF matching applies a linux socket filter to each packet and
+          accepts those for which the filter returns non-zero.
+
+	  To compile it as a module, choose M here.  If unsure, say N.
+
 config NETFILTER_XT_MATCH_CLUSTER
 	tristate '"cluster" match support'
 	depends on NF_CONNTRACK
diff --git a/net/netfilter/Makefile b/net/netfilter/Makefile
index 8e5602f..9f12eeb 100644
--- a/net/netfilter/Makefile
+++ b/net/netfilter/Makefile
@@ -98,6 +98,7 @@ obj-$(CONFIG_NETFILTER_XT_TARGET_IDLETIMER) += xt_IDLETIMER.o
 
 # matches
 obj-$(CONFIG_NETFILTER_XT_MATCH_ADDRTYPE) += xt_addrtype.o
+obj-$(CONFIG_NETFILTER_XT_MATCH_BPF) += xt_bpf.o
 obj-$(CONFIG_NETFILTER_XT_MATCH_CLUSTER) += xt_cluster.o
 obj-$(CONFIG_NETFILTER_XT_MATCH_COMMENT) += xt_comment.o
 obj-$(CONFIG_NETFILTER_XT_MATCH_CONNBYTES) += xt_connbytes.o
diff --git a/net/netfilter/x_tables.c b/net/netfilter/x_tables.c
index 8d987c3..26306be 100644
--- a/net/netfilter/x_tables.c
+++ b/net/netfilter/x_tables.c
@@ -379,8 +379,9 @@ int xt_check_match(struct xt_mtchk_param *par,
 	if (XT_ALIGN(par->match->matchsize) != size &&
 	    par->match->matchsize != -1) {
 		/*
-		 * ebt_among is exempt from centralized matchsize checking
-		 * because it uses a dynamic-size data set.
+		 * matches of variable size length, such as ebt_among,
+		 * are exempt from centralized matchsize checking. They
+		 * skip the test by setting xt_match.matchsize to -1.
 		 */
 		pr_err("%s_tables: %s.%u match: invalid size "
 		       "%u (kernel) != (user) %u\n",
diff --git a/net/netfilter/xt_bpf.c b/net/netfilter/xt_bpf.c
new file mode 100644
index 0000000..07077c5
--- /dev/null
+++ b/net/netfilter/xt_bpf.c
@@ -0,0 +1,88 @@
+/* Xtables module to match packets using a BPF filter.
+ * Copyright 2012 Google Inc.
+ * Written by Willem de Bruijn <willemb@google.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <linux/module.h>
+#include <linux/skbuff.h>
+#include <linux/ipv6.h>
+#include <linux/filter.h>
+#include <net/ip.h>
+
+#include <linux/netfilter/xt_bpf.h>
+#include <linux/netfilter/x_tables.h>
+
+MODULE_AUTHOR("Willem de Bruijn <willemb@google.com>");
+MODULE_DESCRIPTION("Xtables: BPF filter match");
+MODULE_LICENSE("GPL");
+MODULE_ALIAS("ipt_bpf");
+MODULE_ALIAS("ip6t_bpf");
+
+static int bpf_mt_check(const struct xt_mtchk_param *par)
+{
+	struct xt_bpf_info *info = par->matchinfo;
+	const struct xt_entry_match *match;
+	struct sock_fprog program;
+	int expected_len;
+
+	match = container_of(par->matchinfo, const struct xt_entry_match, data);
+	expected_len = sizeof(struct xt_entry_match) +
+		       sizeof(struct xt_bpf_info) +
+		       (sizeof(struct sock_filter) *
+			info->bpf_program_num_elem);
+
+	if (match->u.match_size != expected_len) {
+		pr_info("bpf: check failed: incorrect length\n");
+		return -EINVAL;
+	}
+
+	program.len = info->bpf_program_num_elem;
+	program.filter = info->bpf_program;
+	if (sk_unattached_filter_create(&info->filter, &program)) {
+		pr_info("bpf: check failed: parse error\n");
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+static bool bpf_mt(const struct sk_buff *skb, struct xt_action_param *par)
+{
+	const struct xt_bpf_info *info = par->matchinfo;
+
+	return SK_RUN_FILTER(info->filter, skb);
+}
+
+static void bpf_mt_destroy(const struct xt_mtdtor_param *par)
+{
+	const struct xt_bpf_info *info = par->matchinfo;
+	sk_unattached_filter_destroy(info->filter);
+}
+
+static struct xt_match bpf_mt_reg __read_mostly = {
+	.name		= "bpf",
+	.revision	= 0,
+	.family		= NFPROTO_UNSPEC,
+	.checkentry	= bpf_mt_check,
+	.match		= bpf_mt,
+	.destroy	= bpf_mt_destroy,
+	.matchsize	= -1, /* skip xt_check_match because of dynamic len */
+	.me		= THIS_MODULE,
+};
+
+static int __init bpf_mt_init(void)
+{
+	return xt_register_match(&bpf_mt_reg);
+}
+
+static void __exit bpf_mt_exit(void)
+{
+	xt_unregister_match(&bpf_mt_reg);
+}
+
+module_init(bpf_mt_init);
+module_exit(bpf_mt_exit);
-- 
1.7.7.3

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH rfc] netfilter: two xtables matches
  2012-12-05 19:22 [PATCH rfc] netfilter: two xtables matches Willem de Bruijn
  2012-12-05 19:22 ` [PATCH 1/2] netfilter: add xt_priority xtables match Willem de Bruijn
  2012-12-05 19:22 ` [PATCH 2/2] netfilter: add xt_bpf " Willem de Bruijn
@ 2012-12-05 19:28 ` Willem de Bruijn
  2012-12-05 20:00   ` Jan Engelhardt
  2 siblings, 1 reply; 19+ messages in thread
From: Willem de Bruijn @ 2012-12-05 19:28 UTC (permalink / raw)
  To: netfilter-devel, netdev, Eric Dumazet, David Miller, kaber, pablo

Somehow, the first part of this email went missing. Not critical,
but for completeness:

These two patches each add an xtables match.

The xt_priority match is a straighforward addition in the style of
xt_mark, adding the option to filter on one more sk_buff field. I
have an immediate application for this. The amount of code (in
kernel + userspace) to add a single check proved quite large.

On Wed, Dec 5, 2012 at 2:22 PM, Willem de Bruijn <willemb@google.com> wrote:
> The second patch is more speculative and aims to be a more general
> workaround, as well as a performance optimization: support
> (preferably JIT compiled) BPF programs as iptables match rules.
>
> Potentially, the skb->priority match can be implemented by applying
> only the second patch and adding a new BPF_S_ANC ancillary field to
> Linux Socket Filters.
>
> I also wrote corresponding userspace patches to iptables. The process
> for submitting both kernel and user patches is not 100% clear to me.
> Sending the kernel bits to both netdev and netfilter-devel for
> initial feedback. Please correct me if you want it another way.
>
> The patches apply to net-next.
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/2] netfilter: add xt_bpf xtables match
  2012-12-05 19:22 ` [PATCH 2/2] netfilter: add xt_bpf " Willem de Bruijn
@ 2012-12-05 19:48   ` Pablo Neira Ayuso
  2012-12-05 20:10     ` Willem de Bruijn
  0 siblings, 1 reply; 19+ messages in thread
From: Pablo Neira Ayuso @ 2012-12-05 19:48 UTC (permalink / raw)
  To: Willem de Bruijn; +Cc: netfilter-devel, netdev, edumazet, davem, kaber

Hi Willem,

On Wed, Dec 05, 2012 at 02:22:19PM -0500, Willem de Bruijn wrote:
> A new match that executes sk_run_filter on every packet. BPF filters
> can access skbuff fields that are out of scope for existing iptables
> rules, allow more expressive logic, and on platforms with JIT support
> can even be faster.
> 
> I have a corresponding iptables patch that takes `tcpdump -ddd`
> output, as used in the examples below. The two parts communicate
> using a variable length structure. This is similar to ebt_among,
> but new for iptables.
> 
> Verified functionality by inserting an ip source filter on chain
> INPUT and an ip dest filter on chain OUTPUT and noting that ping
> failed while a rule was active:
> 
> iptables -v -A INPUT -m bpf --bytecode '4,32 0 0 12,21 0 1 $SADDR,6 0 0 96,6 0 0 0,' -j DROP
> iptables -v -A OUTPUT -m bpf --bytecode '4,32 0 0 16,21 0 1 $DADDR,6 0 0 96,6 0 0 0,' -j DROP

I like this BPF idea for iptables.

I made a similar extension time ago, but it was taking a file as
parameter. That file contained in BPF code. I made a simple bison
parser that takes BPF code and put it into the bpf array of
instructions. It would be a bit more intuitive to define a filter and
we can distribute it with iptables.

Let me check on my internal trees, I can put that user-space code
somewhere in case you're interested.

> Evaluated throughput by running netperf TCP_STREAM over loopback on
> x86_64. I expected the BPF filter to outperform hardcoded iptables
> filters when replacing multiple matches with a single bpf match, but
> even a single comparison to u32 appears to do better. Relative to the
> benchmark with no filter applied, rate with 100 BPF filters dropped
> to 81%. With 100 U32 filters it dropped to 55%. The difference sounds
> excessive to me, but was consistent on my hardware. Commands used:
> 
> for i in `seq 100`; do iptables -A OUTPUT -m bpf --bytecode '4,48 0 0 9,21 0 1 20,6 0 0 96,6 0 0 0,' -j DROP; done
> for i in `seq 3`; do netperf -t TCP_STREAM -I 99 -H localhost; done
> 
> iptables -F OUTPUT
> 
> for i in `seq 100`; do iptables -A OUTPUT -m u32 --u32 '6&0xFF=0x20' -j DROP; done
> for i in `seq 3`; do netperf -t TCP_STREAM -I 99 -H localhost; done
> 
> FYI: perf top
> 
> [bpf]
>     33.94%  [kernel]          [k] copy_user_generic_string
>      8.92%  [kernel]          [k] sk_run_filter
>      7.77%  [ip_tables]       [k] ipt_do_table
> 
> [u32]
>     22.63%  [kernel]          [k] copy_user_generic_string
>     14.46%  [kernel]          [k] memcpy
>      9.19%  [ip_tables]       [k] ipt_do_table
>      8.47%  [xt_u32]          [k] u32_mt
>      5.32%  [kernel]          [k] skb_copy_bits
> 
> The big difference appears to be in memory copying. I have not
> looked into u32, so cannot explain this right now. More interestingly,
> at higher rate, sk_run_filter appears to use as many cycles as u32_mt
> (both traces have roughly the same number of events).
> 
> One caveat: to work independent of device link layer, the filter
> expects DLT_RAW style BPF programs, i.e., those that expect the
> packet to start at the IP layer.
> ---
>  include/linux/netfilter/xt_bpf.h |   17 +++++++
>  net/netfilter/Kconfig            |    9 ++++
>  net/netfilter/Makefile           |    1 +
>  net/netfilter/x_tables.c         |    5 +-
>  net/netfilter/xt_bpf.c           |   88 ++++++++++++++++++++++++++++++++++++++
>  5 files changed, 118 insertions(+), 2 deletions(-)
>  create mode 100644 include/linux/netfilter/xt_bpf.h
>  create mode 100644 net/netfilter/xt_bpf.c
> 
> diff --git a/include/linux/netfilter/xt_bpf.h b/include/linux/netfilter/xt_bpf.h
> new file mode 100644
> index 0000000..23502c0
> --- /dev/null
> +++ b/include/linux/netfilter/xt_bpf.h
> @@ -0,0 +1,17 @@
> +#ifndef _XT_BPF_H
> +#define _XT_BPF_H
> +
> +#include <linux/filter.h>
> +#include <linux/types.h>
> +
> +struct xt_bpf_info {
> +	__u16 bpf_program_num_elem;
> +
> +	/* only used in kernel */
> +	struct sk_filter *filter __attribute__((aligned(8)));
> +
> +	/* variable size, based on program_num_elem */
> +	struct sock_filter bpf_program[0];
> +};
> +
> +#endif /*_XT_BPF_H */
> diff --git a/net/netfilter/Kconfig b/net/netfilter/Kconfig
> index c9739c6..c7cc0b8 100644
> --- a/net/netfilter/Kconfig
> +++ b/net/netfilter/Kconfig
> @@ -798,6 +798,15 @@ config NETFILTER_XT_MATCH_ADDRTYPE
>  	  If you want to compile it as a module, say M here and read
>  	  <file:Documentation/kbuild/modules.txt>.  If unsure, say `N'.
>  
> +config NETFILTER_XT_MATCH_BPF
> +	tristate '"bpf" match support'
> +	depends on NETFILTER_ADVANCED
> +	help
> +	  BPF matching applies a linux socket filter to each packet and
> +          accepts those for which the filter returns non-zero.
> +
> +	  To compile it as a module, choose M here.  If unsure, say N.
> +
>  config NETFILTER_XT_MATCH_CLUSTER
>  	tristate '"cluster" match support'
>  	depends on NF_CONNTRACK
> diff --git a/net/netfilter/Makefile b/net/netfilter/Makefile
> index 8e5602f..9f12eeb 100644
> --- a/net/netfilter/Makefile
> +++ b/net/netfilter/Makefile
> @@ -98,6 +98,7 @@ obj-$(CONFIG_NETFILTER_XT_TARGET_IDLETIMER) += xt_IDLETIMER.o
>  
>  # matches
>  obj-$(CONFIG_NETFILTER_XT_MATCH_ADDRTYPE) += xt_addrtype.o
> +obj-$(CONFIG_NETFILTER_XT_MATCH_BPF) += xt_bpf.o
>  obj-$(CONFIG_NETFILTER_XT_MATCH_CLUSTER) += xt_cluster.o
>  obj-$(CONFIG_NETFILTER_XT_MATCH_COMMENT) += xt_comment.o
>  obj-$(CONFIG_NETFILTER_XT_MATCH_CONNBYTES) += xt_connbytes.o
> diff --git a/net/netfilter/x_tables.c b/net/netfilter/x_tables.c
> index 8d987c3..26306be 100644
> --- a/net/netfilter/x_tables.c
> +++ b/net/netfilter/x_tables.c
> @@ -379,8 +379,9 @@ int xt_check_match(struct xt_mtchk_param *par,
>  	if (XT_ALIGN(par->match->matchsize) != size &&
>  	    par->match->matchsize != -1) {
>  		/*
> -		 * ebt_among is exempt from centralized matchsize checking
> -		 * because it uses a dynamic-size data set.
> +		 * matches of variable size length, such as ebt_among,
> +		 * are exempt from centralized matchsize checking. They
> +		 * skip the test by setting xt_match.matchsize to -1.
>  		 */
>  		pr_err("%s_tables: %s.%u match: invalid size "
>  		       "%u (kernel) != (user) %u\n",
> diff --git a/net/netfilter/xt_bpf.c b/net/netfilter/xt_bpf.c
> new file mode 100644
> index 0000000..07077c5
> --- /dev/null
> +++ b/net/netfilter/xt_bpf.c
> @@ -0,0 +1,88 @@
> +/* Xtables module to match packets using a BPF filter.
> + * Copyright 2012 Google Inc.
> + * Written by Willem de Bruijn <willemb@google.com>
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + */
> +
> +#include <linux/module.h>
> +#include <linux/skbuff.h>
> +#include <linux/ipv6.h>
> +#include <linux/filter.h>
> +#include <net/ip.h>
> +
> +#include <linux/netfilter/xt_bpf.h>
> +#include <linux/netfilter/x_tables.h>
> +
> +MODULE_AUTHOR("Willem de Bruijn <willemb@google.com>");
> +MODULE_DESCRIPTION("Xtables: BPF filter match");
> +MODULE_LICENSE("GPL");
> +MODULE_ALIAS("ipt_bpf");
> +MODULE_ALIAS("ip6t_bpf");
> +
> +static int bpf_mt_check(const struct xt_mtchk_param *par)
> +{
> +	struct xt_bpf_info *info = par->matchinfo;
> +	const struct xt_entry_match *match;
> +	struct sock_fprog program;
> +	int expected_len;
> +
> +	match = container_of(par->matchinfo, const struct xt_entry_match, data);
> +	expected_len = sizeof(struct xt_entry_match) +
> +		       sizeof(struct xt_bpf_info) +
> +		       (sizeof(struct sock_filter) *
> +			info->bpf_program_num_elem);
> +
> +	if (match->u.match_size != expected_len) {
> +		pr_info("bpf: check failed: incorrect length\n");
> +		return -EINVAL;
> +	}
> +
> +	program.len = info->bpf_program_num_elem;
> +	program.filter = info->bpf_program;
> +	if (sk_unattached_filter_create(&info->filter, &program)) {
> +		pr_info("bpf: check failed: parse error\n");
> +		return -EINVAL;
> +	}
> +
> +	return 0;
> +}
> +
> +static bool bpf_mt(const struct sk_buff *skb, struct xt_action_param *par)
> +{
> +	const struct xt_bpf_info *info = par->matchinfo;
> +
> +	return SK_RUN_FILTER(info->filter, skb);
> +}
> +
> +static void bpf_mt_destroy(const struct xt_mtdtor_param *par)
> +{
> +	const struct xt_bpf_info *info = par->matchinfo;
> +	sk_unattached_filter_destroy(info->filter);
> +}
> +
> +static struct xt_match bpf_mt_reg __read_mostly = {
> +	.name		= "bpf",
> +	.revision	= 0,
> +	.family		= NFPROTO_UNSPEC,
> +	.checkentry	= bpf_mt_check,
> +	.match		= bpf_mt,
> +	.destroy	= bpf_mt_destroy,
> +	.matchsize	= -1, /* skip xt_check_match because of dynamic len */
> +	.me		= THIS_MODULE,
> +};
> +
> +static int __init bpf_mt_init(void)
> +{
> +	return xt_register_match(&bpf_mt_reg);
> +}
> +
> +static void __exit bpf_mt_exit(void)
> +{
> +	xt_unregister_match(&bpf_mt_reg);
> +}
> +
> +module_init(bpf_mt_init);
> +module_exit(bpf_mt_exit);
> -- 
> 1.7.7.3
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH rfc] netfilter: two xtables matches
  2012-12-05 19:28 ` [PATCH rfc] netfilter: two xtables matches Willem de Bruijn
@ 2012-12-05 20:00   ` Jan Engelhardt
  2012-12-05 21:45     ` Willem de Bruijn
  2012-12-06  5:22     ` Pablo Neira Ayuso
  0 siblings, 2 replies; 19+ messages in thread
From: Jan Engelhardt @ 2012-12-05 20:00 UTC (permalink / raw)
  To: Willem de Bruijn
  Cc: netfilter-devel, netdev, Eric Dumazet, David Miller, kaber, pablo

On Wednesday 2012-12-05 20:28, Willem de Bruijn wrote:

>Somehow, the first part of this email went missing. Not critical,
>but for completeness:
>
>These two patches each add an xtables match.
>
>The xt_priority match is a straighforward addition in the style of
>xt_mark, adding the option to filter on one more sk_buff field. I
>have an immediate application for this. The amount of code (in
>kernel + userspace) to add a single check proved quite large.

Hm so yeah, can't we just place this in xt_mark.c?

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/2] netfilter: add xt_bpf xtables match
  2012-12-05 19:48   ` Pablo Neira Ayuso
@ 2012-12-05 20:10     ` Willem de Bruijn
  2012-12-07 13:16       ` Pablo Neira Ayuso
  0 siblings, 1 reply; 19+ messages in thread
From: Willem de Bruijn @ 2012-12-05 20:10 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: netfilter-devel, netdev, Eric Dumazet, David Miller, kaber

On Wed, Dec 5, 2012 at 2:48 PM, Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> Hi Willem,
>
> On Wed, Dec 05, 2012 at 02:22:19PM -0500, Willem de Bruijn wrote:
>> A new match that executes sk_run_filter on every packet. BPF filters
>> can access skbuff fields that are out of scope for existing iptables
>> rules, allow more expressive logic, and on platforms with JIT support
>> can even be faster.
>>
>> I have a corresponding iptables patch that takes `tcpdump -ddd`
>> output, as used in the examples below. The two parts communicate
>> using a variable length structure. This is similar to ebt_among,
>> but new for iptables.
>>
>> Verified functionality by inserting an ip source filter on chain
>> INPUT and an ip dest filter on chain OUTPUT and noting that ping
>> failed while a rule was active:
>>
>> iptables -v -A INPUT -m bpf --bytecode '4,32 0 0 12,21 0 1 $SADDR,6 0 0 96,6 0 0 0,' -j DROP
>> iptables -v -A OUTPUT -m bpf --bytecode '4,32 0 0 16,21 0 1 $DADDR,6 0 0 96,6 0 0 0,' -j DROP
>
> I like this BPF idea for iptables.
>
> I made a similar extension time ago, but it was taking a file as
> parameter. That file contained in BPF code. I made a simple bison
> parser that takes BPF code and put it into the bpf array of
> instructions. It would be a bit more intuitive to define a filter and
> we can distribute it with iptables.

That's cleaner, indeed. I actually like how tcpdump operates as a
code generator if you pass -ddd. Unfortunately, it generates code only
for link layer types of its supported devices, such as DLT_EN10MB and
DLT_LINUX_SLL. The network layer interface of basic iptables
(forgetting device dependent mechanisms as used in xt_mac) is DLT_RAW,
but that is rarely supported.

> Let me check on my internal trees, I can put that user-space code
> somewhere in case you're interested.

Absolutely. I'll be happy to revise to get it in. I'm also considering
sending a patch to tcpdump to make it generate code independent of the
installed hardware when specifying -y.

>> Evaluated throughput by running netperf TCP_STREAM over loopback on
>> x86_64. I expected the BPF filter to outperform hardcoded iptables
>> filters when replacing multiple matches with a single bpf match, but
>> even a single comparison to u32 appears to do better. Relative to the
>> benchmark with no filter applied, rate with 100 BPF filters dropped
>> to 81%. With 100 U32 filters it dropped to 55%. The difference sounds
>> excessive to me, but was consistent on my hardware. Commands used:
>>
>> for i in `seq 100`; do iptables -A OUTPUT -m bpf --bytecode '4,48 0 0 9,21 0 1 20,6 0 0 96,6 0 0 0,' -j DROP; done
>> for i in `seq 3`; do netperf -t TCP_STREAM -I 99 -H localhost; done
>>
>> iptables -F OUTPUT
>>
>> for i in `seq 100`; do iptables -A OUTPUT -m u32 --u32 '6&0xFF=0x20' -j DROP; done
>> for i in `seq 3`; do netperf -t TCP_STREAM -I 99 -H localhost; done
>>
>> FYI: perf top
>>
>> [bpf]
>>     33.94%  [kernel]          [k] copy_user_generic_string
>>      8.92%  [kernel]          [k] sk_run_filter
>>      7.77%  [ip_tables]       [k] ipt_do_table
>>
>> [u32]
>>     22.63%  [kernel]          [k] copy_user_generic_string
>>     14.46%  [kernel]          [k] memcpy
>>      9.19%  [ip_tables]       [k] ipt_do_table
>>      8.47%  [xt_u32]          [k] u32_mt
>>      5.32%  [kernel]          [k] skb_copy_bits
>>
>> The big difference appears to be in memory copying. I have not
>> looked into u32, so cannot explain this right now. More interestingly,
>> at higher rate, sk_run_filter appears to use as many cycles as u32_mt
>> (both traces have roughly the same number of events).
>>
>> One caveat: to work independent of device link layer, the filter
>> expects DLT_RAW style BPF programs, i.e., those that expect the
>> packet to start at the IP layer.
>> ---
>>  include/linux/netfilter/xt_bpf.h |   17 +++++++
>>  net/netfilter/Kconfig            |    9 ++++
>>  net/netfilter/Makefile           |    1 +
>>  net/netfilter/x_tables.c         |    5 +-
>>  net/netfilter/xt_bpf.c           |   88 ++++++++++++++++++++++++++++++++++++++
>>  5 files changed, 118 insertions(+), 2 deletions(-)
>>  create mode 100644 include/linux/netfilter/xt_bpf.h
>>  create mode 100644 net/netfilter/xt_bpf.c
>>
>> diff --git a/include/linux/netfilter/xt_bpf.h b/include/linux/netfilter/xt_bpf.h
>> new file mode 100644
>> index 0000000..23502c0
>> --- /dev/null
>> +++ b/include/linux/netfilter/xt_bpf.h
>> @@ -0,0 +1,17 @@
>> +#ifndef _XT_BPF_H
>> +#define _XT_BPF_H
>> +
>> +#include <linux/filter.h>
>> +#include <linux/types.h>
>> +
>> +struct xt_bpf_info {
>> +     __u16 bpf_program_num_elem;
>> +
>> +     /* only used in kernel */
>> +     struct sk_filter *filter __attribute__((aligned(8)));
>> +
>> +     /* variable size, based on program_num_elem */
>> +     struct sock_filter bpf_program[0];
>> +};
>> +
>> +#endif /*_XT_BPF_H */
>> diff --git a/net/netfilter/Kconfig b/net/netfilter/Kconfig
>> index c9739c6..c7cc0b8 100644
>> --- a/net/netfilter/Kconfig
>> +++ b/net/netfilter/Kconfig
>> @@ -798,6 +798,15 @@ config NETFILTER_XT_MATCH_ADDRTYPE
>>         If you want to compile it as a module, say M here and read
>>         <file:Documentation/kbuild/modules.txt>.  If unsure, say `N'.
>>
>> +config NETFILTER_XT_MATCH_BPF
>> +     tristate '"bpf" match support'
>> +     depends on NETFILTER_ADVANCED
>> +     help
>> +       BPF matching applies a linux socket filter to each packet and
>> +          accepts those for which the filter returns non-zero.
>> +
>> +       To compile it as a module, choose M here.  If unsure, say N.
>> +
>>  config NETFILTER_XT_MATCH_CLUSTER
>>       tristate '"cluster" match support'
>>       depends on NF_CONNTRACK
>> diff --git a/net/netfilter/Makefile b/net/netfilter/Makefile
>> index 8e5602f..9f12eeb 100644
>> --- a/net/netfilter/Makefile
>> +++ b/net/netfilter/Makefile
>> @@ -98,6 +98,7 @@ obj-$(CONFIG_NETFILTER_XT_TARGET_IDLETIMER) += xt_IDLETIMER.o
>>
>>  # matches
>>  obj-$(CONFIG_NETFILTER_XT_MATCH_ADDRTYPE) += xt_addrtype.o
>> +obj-$(CONFIG_NETFILTER_XT_MATCH_BPF) += xt_bpf.o
>>  obj-$(CONFIG_NETFILTER_XT_MATCH_CLUSTER) += xt_cluster.o
>>  obj-$(CONFIG_NETFILTER_XT_MATCH_COMMENT) += xt_comment.o
>>  obj-$(CONFIG_NETFILTER_XT_MATCH_CONNBYTES) += xt_connbytes.o
>> diff --git a/net/netfilter/x_tables.c b/net/netfilter/x_tables.c
>> index 8d987c3..26306be 100644
>> --- a/net/netfilter/x_tables.c
>> +++ b/net/netfilter/x_tables.c
>> @@ -379,8 +379,9 @@ int xt_check_match(struct xt_mtchk_param *par,
>>       if (XT_ALIGN(par->match->matchsize) != size &&
>>           par->match->matchsize != -1) {
>>               /*
>> -              * ebt_among is exempt from centralized matchsize checking
>> -              * because it uses a dynamic-size data set.
>> +              * matches of variable size length, such as ebt_among,
>> +              * are exempt from centralized matchsize checking. They
>> +              * skip the test by setting xt_match.matchsize to -1.
>>                */
>>               pr_err("%s_tables: %s.%u match: invalid size "
>>                      "%u (kernel) != (user) %u\n",
>> diff --git a/net/netfilter/xt_bpf.c b/net/netfilter/xt_bpf.c
>> new file mode 100644
>> index 0000000..07077c5
>> --- /dev/null
>> +++ b/net/netfilter/xt_bpf.c
>> @@ -0,0 +1,88 @@
>> +/* Xtables module to match packets using a BPF filter.
>> + * Copyright 2012 Google Inc.
>> + * Written by Willem de Bruijn <willemb@google.com>
>> + *
>> + * This program is free software; you can redistribute it and/or modify
>> + * it under the terms of the GNU General Public License version 2 as
>> + * published by the Free Software Foundation.
>> + */
>> +
>> +#include <linux/module.h>
>> +#include <linux/skbuff.h>
>> +#include <linux/ipv6.h>
>> +#include <linux/filter.h>
>> +#include <net/ip.h>
>> +
>> +#include <linux/netfilter/xt_bpf.h>
>> +#include <linux/netfilter/x_tables.h>
>> +
>> +MODULE_AUTHOR("Willem de Bruijn <willemb@google.com>");
>> +MODULE_DESCRIPTION("Xtables: BPF filter match");
>> +MODULE_LICENSE("GPL");
>> +MODULE_ALIAS("ipt_bpf");
>> +MODULE_ALIAS("ip6t_bpf");
>> +
>> +static int bpf_mt_check(const struct xt_mtchk_param *par)
>> +{
>> +     struct xt_bpf_info *info = par->matchinfo;
>> +     const struct xt_entry_match *match;
>> +     struct sock_fprog program;
>> +     int expected_len;
>> +
>> +     match = container_of(par->matchinfo, const struct xt_entry_match, data);
>> +     expected_len = sizeof(struct xt_entry_match) +
>> +                    sizeof(struct xt_bpf_info) +
>> +                    (sizeof(struct sock_filter) *
>> +                     info->bpf_program_num_elem);
>> +
>> +     if (match->u.match_size != expected_len) {
>> +             pr_info("bpf: check failed: incorrect length\n");
>> +             return -EINVAL;
>> +     }
>> +
>> +     program.len = info->bpf_program_num_elem;
>> +     program.filter = info->bpf_program;
>> +     if (sk_unattached_filter_create(&info->filter, &program)) {
>> +             pr_info("bpf: check failed: parse error\n");
>> +             return -EINVAL;
>> +     }
>> +
>> +     return 0;
>> +}
>> +
>> +static bool bpf_mt(const struct sk_buff *skb, struct xt_action_param *par)
>> +{
>> +     const struct xt_bpf_info *info = par->matchinfo;
>> +
>> +     return SK_RUN_FILTER(info->filter, skb);
>> +}
>> +
>> +static void bpf_mt_destroy(const struct xt_mtdtor_param *par)
>> +{
>> +     const struct xt_bpf_info *info = par->matchinfo;
>> +     sk_unattached_filter_destroy(info->filter);
>> +}
>> +
>> +static struct xt_match bpf_mt_reg __read_mostly = {
>> +     .name           = "bpf",
>> +     .revision       = 0,
>> +     .family         = NFPROTO_UNSPEC,
>> +     .checkentry     = bpf_mt_check,
>> +     .match          = bpf_mt,
>> +     .destroy        = bpf_mt_destroy,
>> +     .matchsize      = -1, /* skip xt_check_match because of dynamic len */
>> +     .me             = THIS_MODULE,
>> +};
>> +
>> +static int __init bpf_mt_init(void)
>> +{
>> +     return xt_register_match(&bpf_mt_reg);
>> +}
>> +
>> +static void __exit bpf_mt_exit(void)
>> +{
>> +     xt_unregister_match(&bpf_mt_reg);
>> +}
>> +
>> +module_init(bpf_mt_init);
>> +module_exit(bpf_mt_exit);
>> --
>> 1.7.7.3
>>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH rfc] netfilter: two xtables matches
  2012-12-05 20:00   ` Jan Engelhardt
@ 2012-12-05 21:45     ` Willem de Bruijn
  2012-12-05 21:50       ` Willem de Bruijn
  2012-12-05 22:35       ` Jan Engelhardt
  2012-12-06  5:22     ` Pablo Neira Ayuso
  1 sibling, 2 replies; 19+ messages in thread
From: Willem de Bruijn @ 2012-12-05 21:45 UTC (permalink / raw)
  To: Jan Engelhardt
  Cc: netfilter-devel, netdev, Eric Dumazet, David Miller,
	Patrick McHardy, pablo

On Wed, Dec 5, 2012 at 3:00 PM, Jan Engelhardt <jengelh@inai.de> wrote:
> On Wednesday 2012-12-05 20:28, Willem de Bruijn wrote:
>
>>Somehow, the first part of this email went missing. Not critical,
>>but for completeness:
>>
>>These two patches each add an xtables match.
>>
>>The xt_priority match is a straighforward addition in the style of
>>xt_mark, adding the option to filter on one more sk_buff field. I
>>have an immediate application for this. The amount of code (in
>>kernel + userspace) to add a single check proved quite large.
>
> Hm so yeah, can't we just place this in xt_mark.c?

I'm happy to do so, but note that that breaks the custom of
having one static struct xt_$NAME for each file xt_$NAME.[ch].

It may be reasonable, as the same issue may keep popping up
as additional sk_buff fields are found useful for filtering. For
instance, skb->queue_mapping could be used in conjuction with
network flow classification (ethtool -N). All the ancillary data
accessible from BPF likely has some use and could be ported
to iptables (rxhash, pkt_type, ...).

To avoid rule explosion, I considered an xt_skbuff match rule that
applies the same mask operation, range and inversion tests, and
takes a field id to select the sk_buff field to operate on. I think
the BPF patch is a better long term solution.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH rfc] netfilter: two xtables matches
  2012-12-05 21:45     ` Willem de Bruijn
@ 2012-12-05 21:50       ` Willem de Bruijn
  2012-12-05 22:35       ` Jan Engelhardt
  1 sibling, 0 replies; 19+ messages in thread
From: Willem de Bruijn @ 2012-12-05 21:50 UTC (permalink / raw)
  To: Jan Engelhardt
  Cc: netfilter-devel, netdev, Eric Dumazet, David Miller,
	Patrick McHardy, Pablo Neira Ayuso

On Wed, Dec 5, 2012 at 4:45 PM, Willem de Bruijn <willemb@google.com> wrote:
> On Wed, Dec 5, 2012 at 3:00 PM, Jan Engelhardt <jengelh@inai.de> wrote:
>> On Wednesday 2012-12-05 20:28, Willem de Bruijn wrote:
>>
>>>Somehow, the first part of this email went missing. Not critical,
>>>but for completeness:
>>>
>>>These two patches each add an xtables match.
>>>
>>>The xt_priority match is a straighforward addition in the style of
>>>xt_mark, adding the option to filter on one more sk_buff field. I
>>>have an immediate application for this. The amount of code (in
>>>kernel + userspace) to add a single check proved quite large.
>>
>> Hm so yeah, can't we just place this in xt_mark.c?
>
> I'm happy to do so, but note that that breaks the custom of
> having one static struct xt_$NAME for each file xt_$NAME.[ch].
>
> It may be reasonable, as the same issue may keep popping up
> as additional sk_buff fields are found useful for filtering. For
> instance, skb->queue_mapping could be used in conjuction with
> network flow classification (ethtool -N).

bad example: queue_mapping is tx only. I thought of rxqueues.

> All the ancillary data
> accessible from BPF likely has some use and could be ported
> to iptables (rxhash, pkt_type, ...).
>
> To avoid rule explosion, I considered an xt_skbuff match rule that
> applies the same mask operation, range and inversion tests, and
> takes a field id to select the sk_buff field to operate on. I think
> the BPF patch is a better long term solution.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH rfc] netfilter: two xtables matches
  2012-12-05 21:45     ` Willem de Bruijn
  2012-12-05 21:50       ` Willem de Bruijn
@ 2012-12-05 22:35       ` Jan Engelhardt
  1 sibling, 0 replies; 19+ messages in thread
From: Jan Engelhardt @ 2012-12-05 22:35 UTC (permalink / raw)
  To: Willem de Bruijn
  Cc: netfilter-devel, netdev, Eric Dumazet, David Miller,
	Patrick McHardy, pablo


On Wednesday 2012-12-05 22:45, Willem de Bruijn wrote:
>>>The xt_priority match is a straighforward addition in the style of
>>>xt_mark, adding the option to filter on one more sk_buff field. I
>>>have an immediate application for this. The amount of code (in
>>>kernel + userspace) to add a single check proved quite large.
>>
>> Hm so yeah, can't we just place this in xt_mark.c?
>
>I'm happy to do so, but note that that breaks the custom of
>having one static struct xt_$NAME for each file xt_$NAME.[ch].

The custom is long gone (just look at xt_mark.c ;-),
because the module overhead is so much more than a function with
an assignment/readout.

>To avoid rule explosion, I considered an xt_skbuff match rule that
>applies the same mask operation, range and inversion tests, and
>takes a field id to select the sk_buff field to operate on. I think
>the BPF patch is a better long term solution.

I can't disagree.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH rfc] netfilter: two xtables matches
  2012-12-05 20:00   ` Jan Engelhardt
  2012-12-05 21:45     ` Willem de Bruijn
@ 2012-12-06  5:22     ` Pablo Neira Ayuso
  2012-12-06 21:12       ` Willem de Bruijn
  1 sibling, 1 reply; 19+ messages in thread
From: Pablo Neira Ayuso @ 2012-12-06  5:22 UTC (permalink / raw)
  To: Jan Engelhardt
  Cc: Willem de Bruijn, netfilter-devel, netdev, Eric Dumazet,
	David Miller, kaber

On Wed, Dec 05, 2012 at 09:00:36PM +0100, Jan Engelhardt wrote:
> On Wednesday 2012-12-05 20:28, Willem de Bruijn wrote:
> 
> >Somehow, the first part of this email went missing. Not critical,
> >but for completeness:
> >
> >These two patches each add an xtables match.
> >
> >The xt_priority match is a straighforward addition in the style of
> >xt_mark, adding the option to filter on one more sk_buff field. I
> >have an immediate application for this. The amount of code (in
> >kernel + userspace) to add a single check proved quite large.
> 
> Hm so yeah, can't we just place this in xt_mark.c?

I don't feel this belongs to xt_mark at all.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH rfc] netfilter: two xtables matches
  2012-12-06  5:22     ` Pablo Neira Ayuso
@ 2012-12-06 21:12       ` Willem de Bruijn
  2012-12-07  7:22         ` Pablo Neira Ayuso
  2012-12-07 13:20         ` Pablo Neira Ayuso
  0 siblings, 2 replies; 19+ messages in thread
From: Willem de Bruijn @ 2012-12-06 21:12 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: Jan Engelhardt, netfilter-devel, netdev, Eric Dumazet,
	David Miller, Patrick McHardy

On Thu, Dec 6, 2012 at 12:22 AM, Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> On Wed, Dec 05, 2012 at 09:00:36PM +0100, Jan Engelhardt wrote:
>> On Wednesday 2012-12-05 20:28, Willem de Bruijn wrote:
>>
>> >Somehow, the first part of this email went missing. Not critical,
>> >but for completeness:
>> >
>> >These two patches each add an xtables match.
>> >
>> >The xt_priority match is a straighforward addition in the style of
>> >xt_mark, adding the option to filter on one more sk_buff field. I
>> >have an immediate application for this. The amount of code (in
>> >kernel + userspace) to add a single check proved quite large.
>>
>> Hm so yeah, can't we just place this in xt_mark.c?
>
> I don't feel this belongs to xt_mark at all.

Do you have other concerns, or can I resubmit as is for merging in a
few days if no one raises additional issues?

For this and netfilter changes in general: should these patches be
against git://1984.lsi.us.es/nf-next instead of net-next? This patch
likely applies cleanly there, but I haven't tried yet. Thanks.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH rfc] netfilter: two xtables matches
  2012-12-06 21:12       ` Willem de Bruijn
@ 2012-12-07  7:22         ` Pablo Neira Ayuso
  2012-12-07 13:20         ` Pablo Neira Ayuso
  1 sibling, 0 replies; 19+ messages in thread
From: Pablo Neira Ayuso @ 2012-12-07  7:22 UTC (permalink / raw)
  To: Willem de Bruijn
  Cc: Jan Engelhardt, netfilter-devel, netdev, Eric Dumazet,
	David Miller, Patrick McHardy

On Thu, Dec 06, 2012 at 04:12:10PM -0500, Willem de Bruijn wrote:
> On Thu, Dec 6, 2012 at 12:22 AM, Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> > On Wed, Dec 05, 2012 at 09:00:36PM +0100, Jan Engelhardt wrote:
> >> On Wednesday 2012-12-05 20:28, Willem de Bruijn wrote:
> >>
> >> >Somehow, the first part of this email went missing. Not critical,
> >> >but for completeness:
> >> >
> >> >These two patches each add an xtables match.
> >> >
> >> >The xt_priority match is a straighforward addition in the style of
> >> >xt_mark, adding the option to filter on one more sk_buff field. I
> >> >have an immediate application for this. The amount of code (in
> >> >kernel + userspace) to add a single check proved quite large.
> >>
> >> Hm so yeah, can't we just place this in xt_mark.c?
> >
> > I don't feel this belongs to xt_mark at all.
> 
> Do you have other concerns, or can I resubmit as is for merging in a
> few days if no one raises additional issues?
> 
> For this and netfilter changes in general: should these patches be
> against git://1984.lsi.us.es/nf-next instead of net-next? This patch
> likely applies cleanly there, but I haven't tried yet. Thanks.

Please, against nf-next since this has to go throuh the netfilter
tree.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/2] netfilter: add xt_bpf xtables match
  2012-12-05 20:10     ` Willem de Bruijn
@ 2012-12-07 13:16       ` Pablo Neira Ayuso
  2012-12-07 16:56         ` Willem de Bruijn
  0 siblings, 1 reply; 19+ messages in thread
From: Pablo Neira Ayuso @ 2012-12-07 13:16 UTC (permalink / raw)
  To: Willem de Bruijn
  Cc: netfilter-devel, netdev, Eric Dumazet, David Miller, kaber

On Wed, Dec 05, 2012 at 03:10:13PM -0500, Willem de Bruijn wrote:
> On Wed, Dec 5, 2012 at 2:48 PM, Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> > Hi Willem,
> >
> > On Wed, Dec 05, 2012 at 02:22:19PM -0500, Willem de Bruijn wrote:
> >> A new match that executes sk_run_filter on every packet. BPF filters
> >> can access skbuff fields that are out of scope for existing iptables
> >> rules, allow more expressive logic, and on platforms with JIT support
> >> can even be faster.
> >>
> >> I have a corresponding iptables patch that takes `tcpdump -ddd`
> >> output, as used in the examples below. The two parts communicate
> >> using a variable length structure. This is similar to ebt_among,
> >> but new for iptables.
> >>
> >> Verified functionality by inserting an ip source filter on chain
> >> INPUT and an ip dest filter on chain OUTPUT and noting that ping
> >> failed while a rule was active:
> >>
> >> iptables -v -A INPUT -m bpf --bytecode '4,32 0 0 12,21 0 1 $SADDR,6 0 0 96,6 0 0 0,' -j DROP
> >> iptables -v -A OUTPUT -m bpf --bytecode '4,32 0 0 16,21 0 1 $DADDR,6 0 0 96,6 0 0 0,' -j DROP
> >
> > I like this BPF idea for iptables.
> >
> > I made a similar extension time ago, but it was taking a file as
> > parameter. That file contained in BPF code. I made a simple bison
> > parser that takes BPF code and put it into the bpf array of
> > instructions. It would be a bit more intuitive to define a filter and
> > we can distribute it with iptables.
> 
> That's cleaner, indeed. I actually like how tcpdump operates as a
> code generator if you pass -ddd. Unfortunately, it generates code only
> for link layer types of its supported devices, such as DLT_EN10MB and
> DLT_LINUX_SLL. The network layer interface of basic iptables
> (forgetting device dependent mechanisms as used in xt_mac) is DLT_RAW,
> but that is rarely supported.

Indeed, you'll have to hack on tcpdump to select the offset. In
iptables the base is the layer 3 header. With that change you could
use tcpdump for generate code automagically from their syntax.

> > Let me check on my internal trees, I can put that user-space code
> > somewhere in case you're interested.
> 
> Absolutely. I'll be happy to revise to get it in. I'm also considering
> sending a patch to tcpdump to make it generate code independent of the
> installed hardware when specifying -y.

I found a version of the old parser code I made:

http://1984.lsi.us.es/git/nfbpf/

It interprets a filter expressed in a similar way to tcpdump -dd but
it's using the BPF constants. It's quite preliminary and simple if you
look at the code.

Extending it to interpret some syntax similar to tcpdump -d would even
make more readable the BPF filter.

Time ago I also thought about taking the kernel code that checks that
the filter is correct. Currently you get -EINVAL if you pass a
handcrafted filter which is incorrect, so it's hard task to debug what
you made wrong.

It could be added to the iptables tree. Or if generic enough for BPF
and the effort is worth, just provide some small library that iptables
can link with and a small compiler/checker to help people develop BPF
filters.

Back to your xt_bpf thing, we can use the file containing the code
instead:

iptables -v -A INPUT -m bpf --bytecode-file filter1.bpf -j DROP
iptables -v -A OUTPUT -m bpf --bytecode-file filter2.bpf -j DROP

We can still allow the inlined filter via --bytecode if you want.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH rfc] netfilter: two xtables matches
  2012-12-06 21:12       ` Willem de Bruijn
  2012-12-07  7:22         ` Pablo Neira Ayuso
@ 2012-12-07 13:20         ` Pablo Neira Ayuso
  2012-12-07 17:26           ` Willem de Bruijn
  1 sibling, 1 reply; 19+ messages in thread
From: Pablo Neira Ayuso @ 2012-12-07 13:20 UTC (permalink / raw)
  To: Willem de Bruijn
  Cc: Jan Engelhardt, netfilter-devel, netdev, Eric Dumazet,
	David Miller, Patrick McHardy

On Thu, Dec 06, 2012 at 04:12:10PM -0500, Willem de Bruijn wrote:
> On Thu, Dec 6, 2012 at 12:22 AM, Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> > On Wed, Dec 05, 2012 at 09:00:36PM +0100, Jan Engelhardt wrote:
> >> On Wednesday 2012-12-05 20:28, Willem de Bruijn wrote:
> >>
> >> >Somehow, the first part of this email went missing. Not critical,
> >> >but for completeness:
> >> >
> >> >These two patches each add an xtables match.
> >> >
> >> >The xt_priority match is a straighforward addition in the style of
> >> >xt_mark, adding the option to filter on one more sk_buff field. I
> >> >have an immediate application for this. The amount of code (in
> >> >kernel + userspace) to add a single check proved quite large.
> >>
> >> Hm so yeah, can't we just place this in xt_mark.c?
> >
> > I don't feel this belongs to xt_mark at all.
> 
> Do you have other concerns, or can I resubmit as is for merging in a
> few days if no one raises additional issues?

In nftables we have the 'meta' extension that allows to match all
skbuff fields (among other things):

http://1984.lsi.us.es/git/nf-next/tree/net/netfilter/nft_meta.c?h=nf_tables8

I think it's the way to go so we stop adding small matches for each
skbuff field.

I don't mind the name if it's xt_skbuff or xt_meta.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/2] netfilter: add xt_bpf xtables match
  2012-12-07 13:16       ` Pablo Neira Ayuso
@ 2012-12-07 16:56         ` Willem de Bruijn
  2012-12-08  3:31           ` Pablo Neira Ayuso
  0 siblings, 1 reply; 19+ messages in thread
From: Willem de Bruijn @ 2012-12-07 16:56 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: netfilter-devel, netdev, Eric Dumazet, David Miller, kaber

On Fri, Dec 7, 2012 at 8:16 AM, Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> On Wed, Dec 05, 2012 at 03:10:13PM -0500, Willem de Bruijn wrote:
>> On Wed, Dec 5, 2012 at 2:48 PM, Pablo Neira Ayuso <pablo@netfilter.org> wrote:
>> > Hi Willem,
>> >
>> > On Wed, Dec 05, 2012 at 02:22:19PM -0500, Willem de Bruijn wrote:
>> >> A new match that executes sk_run_filter on every packet. BPF filters
>> >> can access skbuff fields that are out of scope for existing iptables
>> >> rules, allow more expressive logic, and on platforms with JIT support
>> >> can even be faster.
>> >>
>> >> I have a corresponding iptables patch that takes `tcpdump -ddd`
>> >> output, as used in the examples below. The two parts communicate
>> >> using a variable length structure. This is similar to ebt_among,
>> >> but new for iptables.
>> >>
>> >> Verified functionality by inserting an ip source filter on chain
>> >> INPUT and an ip dest filter on chain OUTPUT and noting that ping
>> >> failed while a rule was active:
>> >>
>> >> iptables -v -A INPUT -m bpf --bytecode '4,32 0 0 12,21 0 1 $SADDR,6 0 0 96,6 0 0 0,' -j DROP
>> >> iptables -v -A OUTPUT -m bpf --bytecode '4,32 0 0 16,21 0 1 $DADDR,6 0 0 96,6 0 0 0,' -j DROP
>> >
>> > I like this BPF idea for iptables.
>> >
>> > I made a similar extension time ago, but it was taking a file as
>> > parameter. That file contained in BPF code. I made a simple bison
>> > parser that takes BPF code and put it into the bpf array of
>> > instructions. It would be a bit more intuitive to define a filter and
>> > we can distribute it with iptables.
>>
>> That's cleaner, indeed. I actually like how tcpdump operates as a
>> code generator if you pass -ddd. Unfortunately, it generates code only
>> for link layer types of its supported devices, such as DLT_EN10MB and
>> DLT_LINUX_SLL. The network layer interface of basic iptables
>> (forgetting device dependent mechanisms as used in xt_mac) is DLT_RAW,
>> but that is rarely supported.
>
> Indeed, you'll have to hack on tcpdump to select the offset. In
> iptables the base is the layer 3 header. With that change you could
> use tcpdump for generate code automagically from their syntax.
>
>> > Let me check on my internal trees, I can put that user-space code
>> > somewhere in case you're interested.
>>
>> Absolutely. I'll be happy to revise to get it in. I'm also considering
>> sending a patch to tcpdump to make it generate code independent of the
>> installed hardware when specifying -y.
>
> I found a version of the old parser code I made:
>
> http://1984.lsi.us.es/git/nfbpf/
>
> It interprets a filter expressed in a similar way to tcpdump -dd but
> it's using the BPF constants. It's quite preliminary and simple if you
> look at the code.
>
> Extending it to interpret some syntax similar to tcpdump -d would even
> make more readable the BPF filter.
>
> Time ago I also thought about taking the kernel code that checks that
> the filter is correct. Currently you get -EINVAL if you pass a
> handcrafted filter which is incorrect, so it's hard task to debug what
> you made wrong.
>
> It could be added to the iptables tree. Or if generic enough for BPF
> and the effort is worth, just provide some small library that iptables
> can link with and a small compiler/checker to help people develop BPF
> filters.

Or use pcap_compile? I went with the tcpdump output to avoid
introducing a direct dependency on pcap to iptables. One possible
downside I see to pcap_compile vs. developing from scratch is that it
might lag in supporting the LSF ancillary data fields.

> Back to your xt_bpf thing, we can use the file containing the code
> instead:
>
> iptables -v -A INPUT -m bpf --bytecode-file filter1.bpf -j DROP
> iptables -v -A OUTPUT -m bpf --bytecode-file filter2.bpf -j DROP
>
> We can still allow the inlined filter via --bytecode if you want.

I'll add that. I'd like to keep --bytecode to able to generate the
code inline using backticks.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH rfc] netfilter: two xtables matches
  2012-12-07 13:20         ` Pablo Neira Ayuso
@ 2012-12-07 17:26           ` Willem de Bruijn
  0 siblings, 0 replies; 19+ messages in thread
From: Willem de Bruijn @ 2012-12-07 17:26 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: Jan Engelhardt, netfilter-devel, netdev, Eric Dumazet,
	David Miller, Patrick McHardy

On Fri, Dec 7, 2012 at 8:20 AM, Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> On Thu, Dec 06, 2012 at 04:12:10PM -0500, Willem de Bruijn wrote:
>> On Thu, Dec 6, 2012 at 12:22 AM, Pablo Neira Ayuso <pablo@netfilter.org> wrote:
>> > On Wed, Dec 05, 2012 at 09:00:36PM +0100, Jan Engelhardt wrote:
>> >> On Wednesday 2012-12-05 20:28, Willem de Bruijn wrote:
>> >>
>> >> >Somehow, the first part of this email went missing. Not critical,
>> >> >but for completeness:
>> >> >
>> >> >These two patches each add an xtables match.
>> >> >
>> >> >The xt_priority match is a straighforward addition in the style of
>> >> >xt_mark, adding the option to filter on one more sk_buff field. I
>> >> >have an immediate application for this. The amount of code (in
>> >> >kernel + userspace) to add a single check proved quite large.
>> >>
>> >> Hm so yeah, can't we just place this in xt_mark.c?
>> >
>> > I don't feel this belongs to xt_mark at all.
>>
>> Do you have other concerns, or can I resubmit as is for merging in a
>> few days if no one raises additional issues?
>
> In nftables we have the 'meta' extension that allows to match all
> skbuff fields (among other things):
>
> http://1984.lsi.us.es/git/nf-next/tree/net/netfilter/nft_meta.c?h=nf_tables8
>
> I think it's the way to go so we stop adding small matches for each
> skbuff field.
>
> I don't mind the name if it's xt_skbuff or xt_meta.

Okay. I'll respin right now with one more field to select the skb field to
match on, as a patch against tree nf-next, and will send that to
netfilter-devel.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/2] netfilter: add xt_bpf xtables match
  2012-12-07 16:56         ` Willem de Bruijn
@ 2012-12-08  3:31           ` Pablo Neira Ayuso
  2012-12-08 16:02             ` Daniel Borkmann
  0 siblings, 1 reply; 19+ messages in thread
From: Pablo Neira Ayuso @ 2012-12-08  3:31 UTC (permalink / raw)
  To: Willem de Bruijn
  Cc: netfilter-devel, netdev, Eric Dumazet, David Miller, kaber

On Fri, Dec 07, 2012 at 11:56:05AM -0500, Willem de Bruijn wrote:
> On Fri, Dec 7, 2012 at 8:16 AM, Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> > On Wed, Dec 05, 2012 at 03:10:13PM -0500, Willem de Bruijn wrote:
> >> On Wed, Dec 5, 2012 at 2:48 PM, Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> >> > Hi Willem,
> >> >
> >> > On Wed, Dec 05, 2012 at 02:22:19PM -0500, Willem de Bruijn wrote:
> >> >> A new match that executes sk_run_filter on every packet. BPF filters
> >> >> can access skbuff fields that are out of scope for existing iptables
> >> >> rules, allow more expressive logic, and on platforms with JIT support
> >> >> can even be faster.
> >> >>
> >> >> I have a corresponding iptables patch that takes `tcpdump -ddd`
> >> >> output, as used in the examples below. The two parts communicate
> >> >> using a variable length structure. This is similar to ebt_among,
> >> >> but new for iptables.
> >> >>
> >> >> Verified functionality by inserting an ip source filter on chain
> >> >> INPUT and an ip dest filter on chain OUTPUT and noting that ping
> >> >> failed while a rule was active:
> >> >>
> >> >> iptables -v -A INPUT -m bpf --bytecode '4,32 0 0 12,21 0 1 $SADDR,6 0 0 96,6 0 0 0,' -j DROP
> >> >> iptables -v -A OUTPUT -m bpf --bytecode '4,32 0 0 16,21 0 1 $DADDR,6 0 0 96,6 0 0 0,' -j DROP
> >> >
> >> > I like this BPF idea for iptables.
> >> >
> >> > I made a similar extension time ago, but it was taking a file as
> >> > parameter. That file contained in BPF code. I made a simple bison
> >> > parser that takes BPF code and put it into the bpf array of
> >> > instructions. It would be a bit more intuitive to define a filter and
> >> > we can distribute it with iptables.
> >>
> >> That's cleaner, indeed. I actually like how tcpdump operates as a
> >> code generator if you pass -ddd. Unfortunately, it generates code only
> >> for link layer types of its supported devices, such as DLT_EN10MB and
> >> DLT_LINUX_SLL. The network layer interface of basic iptables
> >> (forgetting device dependent mechanisms as used in xt_mac) is DLT_RAW,
> >> but that is rarely supported.
> >
> > Indeed, you'll have to hack on tcpdump to select the offset. In
> > iptables the base is the layer 3 header. With that change you could
> > use tcpdump for generate code automagically from their syntax.
> >
> >> > Let me check on my internal trees, I can put that user-space code
> >> > somewhere in case you're interested.
> >>
> >> Absolutely. I'll be happy to revise to get it in. I'm also considering
> >> sending a patch to tcpdump to make it generate code independent of the
> >> installed hardware when specifying -y.
> >
> > I found a version of the old parser code I made:
> >
> > http://1984.lsi.us.es/git/nfbpf/
> >
> > It interprets a filter expressed in a similar way to tcpdump -dd but
> > it's using the BPF constants. It's quite preliminary and simple if you
> > look at the code.
> >
> > Extending it to interpret some syntax similar to tcpdump -d would even
> > make more readable the BPF filter.
> >
> > Time ago I also thought about taking the kernel code that checks that
> > the filter is correct. Currently you get -EINVAL if you pass a
> > handcrafted filter which is incorrect, so it's hard task to debug what
> > you made wrong.
> >
> > It could be added to the iptables tree. Or if generic enough for BPF
> > and the effort is worth, just provide some small library that iptables
> > can link with and a small compiler/checker to help people develop BPF
> > filters.
> 
> Or use pcap_compile? I went with the tcpdump output to avoid
> introducing a direct dependency on pcap to iptables. One possible
> downside I see to pcap_compile vs. developing from scratch is that it
> might lag in supporting the LSF ancillary data fields.

I suggest to put the code of that preliminary nfbpf utility into
iptables to allow to read the BPF filters from a file and put them
into the BPF array of instructions. I can help with that.

> > Back to your xt_bpf thing, we can use the file containing the code
> > instead:
> >
> > iptables -v -A INPUT -m bpf --bytecode-file filter1.bpf -j DROP
> > iptables -v -A OUTPUT -m bpf --bytecode-file filter2.bpf -j DROP
> >
> > We can still allow the inlined filter via --bytecode if you want.
> 
> I'll add that. I'd like to keep --bytecode to able to generate the
> code inline using backticks.

As said, I'm fine with that, but I'll be really happy if we can
provide some utility to generate that code using backticks for the
masses (in case they want to pass it inlined in that format).

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/2] netfilter: add xt_bpf xtables match
  2012-12-08  3:31           ` Pablo Neira Ayuso
@ 2012-12-08 16:02             ` Daniel Borkmann
  0 siblings, 0 replies; 19+ messages in thread
From: Daniel Borkmann @ 2012-12-08 16:02 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: Willem de Bruijn, netfilter-devel, netdev, Eric Dumazet,
	David Miller, kaber

On Sat, Dec 8, 2012 at 4:31 AM, Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> On Fri, Dec 07, 2012 at 11:56:05AM -0500, Willem de Bruijn wrote:
>> On Fri, Dec 7, 2012 at 8:16 AM, Pablo Neira Ayuso <pablo@netfilter.org> wrote:
>> > On Wed, Dec 05, 2012 at 03:10:13PM -0500, Willem de Bruijn wrote:
>> >> On Wed, Dec 5, 2012 at 2:48 PM, Pablo Neira Ayuso <pablo@netfilter.org> wrote:
>> >> > Hi Willem,
>> >> >
>> >> > On Wed, Dec 05, 2012 at 02:22:19PM -0500, Willem de Bruijn wrote:
>> >> >> A new match that executes sk_run_filter on every packet. BPF filters
>> >> >> can access skbuff fields that are out of scope for existing iptables
>> >> >> rules, allow more expressive logic, and on platforms with JIT support
>> >> >> can even be faster.
>> >> >>
>> >> >> I have a corresponding iptables patch that takes `tcpdump -ddd`
>> >> >> output, as used in the examples below. The two parts communicate
>> >> >> using a variable length structure. This is similar to ebt_among,
>> >> >> but new for iptables.
>> >> >>
>> >> >> Verified functionality by inserting an ip source filter on chain
>> >> >> INPUT and an ip dest filter on chain OUTPUT and noting that ping
>> >> >> failed while a rule was active:
>> >> >>
>> >> >> iptables -v -A INPUT -m bpf --bytecode '4,32 0 0 12,21 0 1 $SADDR,6 0 0 96,6 0 0 0,' -j DROP
>> >> >> iptables -v -A OUTPUT -m bpf --bytecode '4,32 0 0 16,21 0 1 $DADDR,6 0 0 96,6 0 0 0,' -j DROP
>> >> >
>> >> > I like this BPF idea for iptables.
>> >> >
>> >> > I made a similar extension time ago, but it was taking a file as
>> >> > parameter. That file contained in BPF code. I made a simple bison
>> >> > parser that takes BPF code and put it into the bpf array of
>> >> > instructions. It would be a bit more intuitive to define a filter and
>> >> > we can distribute it with iptables.
>> >>
>> >> That's cleaner, indeed. I actually like how tcpdump operates as a
>> >> code generator if you pass -ddd. Unfortunately, it generates code only
>> >> for link layer types of its supported devices, such as DLT_EN10MB and
>> >> DLT_LINUX_SLL. The network layer interface of basic iptables
>> >> (forgetting device dependent mechanisms as used in xt_mac) is DLT_RAW,
>> >> but that is rarely supported.
>> >
>> > Indeed, you'll have to hack on tcpdump to select the offset. In
>> > iptables the base is the layer 3 header. With that change you could
>> > use tcpdump for generate code automagically from their syntax.
>> >
>> >> > Let me check on my internal trees, I can put that user-space code
>> >> > somewhere in case you're interested.
>> >>
>> >> Absolutely. I'll be happy to revise to get it in. I'm also considering
>> >> sending a patch to tcpdump to make it generate code independent of the
>> >> installed hardware when specifying -y.
>> >
>> > I found a version of the old parser code I made:
>> >
>> > http://1984.lsi.us.es/git/nfbpf/
>> >
>> > It interprets a filter expressed in a similar way to tcpdump -dd but
>> > it's using the BPF constants. It's quite preliminary and simple if you
>> > look at the code.
>> >
>> > Extending it to interpret some syntax similar to tcpdump -d would even
>> > make more readable the BPF filter.
>> >
>> > Time ago I also thought about taking the kernel code that checks that
>> > the filter is correct. Currently you get -EINVAL if you pass a
>> > handcrafted filter which is incorrect, so it's hard task to debug what
>> > you made wrong.
>> >
>> > It could be added to the iptables tree. Or if generic enough for BPF
>> > and the effort is worth, just provide some small library that iptables
>> > can link with and a small compiler/checker to help people develop BPF
>> > filters.
>>
>> Or use pcap_compile? I went with the tcpdump output to avoid
>> introducing a direct dependency on pcap to iptables. One possible
>> downside I see to pcap_compile vs. developing from scratch is that it
>> might lag in supporting the LSF ancillary data fields.
>
> I suggest to put the code of that preliminary nfbpf utility into
> iptables to allow to read the BPF filters from a file and put them
> into the BPF array of instructions. I can help with that.
>
>> > Back to your xt_bpf thing, we can use the file containing the code
>> > instead:
>> >
>> > iptables -v -A INPUT -m bpf --bytecode-file filter1.bpf -j DROP
>> > iptables -v -A OUTPUT -m bpf --bytecode-file filter2.bpf -j DROP
>> >
>> > We can still allow the inlined filter via --bytecode if you want.
>>
>> I'll add that. I'd like to keep --bytecode to able to generate the
>> code inline using backticks.
>
> As said, I'm fine with that, but I'll be really happy if we can
> provide some utility to generate that code using backticks for the
> masses (in case they want to pass it inlined in that format).

If it helps, you could use "bpfc", or rip-off its code to not have a
dependency; it's part of the netsniff-ng toolkit.

It can be used like:

bpfc examples/bpfc/arp.bpf
{ 0x28, 0, 0, 0x0000000c },
{ 0x15, 0, 1, 0x00000806 },
{ 0x6, 0, 0, 0xffffffff },
{ 0x6, 0, 0, 0x00000000 },

where arp.bpf is, for instance:

_main:
  ldh [12]
  jeq #0x806, keep, drop
keep:
  ret #0xffffffff
drop:
  ret #0

"Core" files are: src/bpf_lexer.l, src/bpf_parser.y

It also supports all Linux ANC-operations that were added to the
kernel (like VLAN, XOR and so on). I started but didn't have time to
continue a higher-level language for that, that would translate to
such an example above (which then translates again to opcodes).

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2012-12-08 16:02 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-12-05 19:22 [PATCH rfc] netfilter: two xtables matches Willem de Bruijn
2012-12-05 19:22 ` [PATCH 1/2] netfilter: add xt_priority xtables match Willem de Bruijn
2012-12-05 19:22 ` [PATCH 2/2] netfilter: add xt_bpf " Willem de Bruijn
2012-12-05 19:48   ` Pablo Neira Ayuso
2012-12-05 20:10     ` Willem de Bruijn
2012-12-07 13:16       ` Pablo Neira Ayuso
2012-12-07 16:56         ` Willem de Bruijn
2012-12-08  3:31           ` Pablo Neira Ayuso
2012-12-08 16:02             ` Daniel Borkmann
2012-12-05 19:28 ` [PATCH rfc] netfilter: two xtables matches Willem de Bruijn
2012-12-05 20:00   ` Jan Engelhardt
2012-12-05 21:45     ` Willem de Bruijn
2012-12-05 21:50       ` Willem de Bruijn
2012-12-05 22:35       ` Jan Engelhardt
2012-12-06  5:22     ` Pablo Neira Ayuso
2012-12-06 21:12       ` Willem de Bruijn
2012-12-07  7:22         ` Pablo Neira Ayuso
2012-12-07 13:20         ` Pablo Neira Ayuso
2012-12-07 17:26           ` Willem de Bruijn

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).