netfilter-devel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH -next 0/9] nftables: add zone support to ct statement
@ 2017-02-03 12:35 Florian Westphal
  2017-02-03 12:35 ` [PATCH nf-next 1/9] netfilter: nft_ct: add zone id get support Florian Westphal
                   ` (8 more replies)
  0 siblings, 9 replies; 19+ messages in thread
From: Florian Westphal @ 2017-02-03 12:35 UTC (permalink / raw)
  To: netfilter-devel

This adds the ability to set the conntrack zone from nftables, i.e.
native replacement for -j CT --zone $number.

See individual patches for details.
This will need more documentation and exposure of the builtin
hook priorities (e.g. via defines?) so users can more easily
see whats happening.

Pablo suggested to allow something like

hook prerouting prio $raw;
or even
hook prerouting prio $conntrack - 1;

instead of the 'awkward' use of the actual numbers used by the kernel
('priority -300' to hook at same priority as raw table).

However, this series doesn't contain any of that, so users will
have to use priorities between -399 and -199 (i.e. after defrag and
before conntrack pickup) to assign zones.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH nf-next 1/9] netfilter: nft_ct: add zone id get support
  2017-02-03 12:35 [PATCH -next 0/9] nftables: add zone support to ct statement Florian Westphal
@ 2017-02-03 12:35 ` Florian Westphal
  2017-02-08  9:28   ` Pablo Neira Ayuso
  2017-02-03 12:35 ` [PATCH nf-next 2/9] netfilter: nft_ct: prepare for key-dependent error unwind Florian Westphal
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 19+ messages in thread
From: Florian Westphal @ 2017-02-03 12:35 UTC (permalink / raw)
  To: netfilter-devel; +Cc: Florian Westphal

Just like with counters the direction attribute is optional.
We set priv->dir to MAX unconditionally to avoid duplicating the assignment
for all keys with optional direction.

For keys where direction is mandatory, existing code already returns
an error.

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 include/uapi/linux/netfilter/nf_tables.h |  2 ++
 net/netfilter/nft_ct.c                   | 22 +++++++++++++++++++---
 2 files changed, 21 insertions(+), 3 deletions(-)

diff --git a/include/uapi/linux/netfilter/nf_tables.h b/include/uapi/linux/netfilter/nf_tables.h
index b00a05d1ee56..b972e72623c2 100644
--- a/include/uapi/linux/netfilter/nf_tables.h
+++ b/include/uapi/linux/netfilter/nf_tables.h
@@ -864,6 +864,7 @@ enum nft_rt_attributes {
  * @NFT_CT_PKTS: conntrack packets
  * @NFT_CT_BYTES: conntrack bytes
  * @NFT_CT_AVGPKT: conntrack average bytes per packet
+ * @NFT_CT_ZONE: conntrack zone
  */
 enum nft_ct_keys {
 	NFT_CT_STATE,
@@ -883,6 +884,7 @@ enum nft_ct_keys {
 	NFT_CT_PKTS,
 	NFT_CT_BYTES,
 	NFT_CT_AVGPKT,
+	NFT_CT_ZONE,
 };
 
 /**
diff --git a/net/netfilter/nft_ct.c b/net/netfilter/nft_ct.c
index 66a2377510e1..5bd4cdfdcda5 100644
--- a/net/netfilter/nft_ct.c
+++ b/net/netfilter/nft_ct.c
@@ -151,6 +151,18 @@ static void nft_ct_get_eval(const struct nft_expr *expr,
 	case NFT_CT_PROTOCOL:
 		*dest = nf_ct_protonum(ct);
 		return;
+#ifdef CONFIG_NF_CONNTRACK_ZONES
+	case NFT_CT_ZONE: {
+		const struct nf_conntrack_zone *zone = nf_ct_zone(ct);
+
+		if (priv->dir < IP_CT_DIR_MAX)
+			*dest = nf_ct_zone_id(zone, priv->dir);
+		else
+			*dest = zone->id;
+
+		return;
+	}
+#endif
 	default:
 		break;
 	}
@@ -266,6 +278,7 @@ static int nft_ct_get_init(const struct nft_ctx *ctx,
 	int err;
 
 	priv->key = ntohl(nla_get_be32(tb[NFTA_CT_KEY]));
+	priv->dir = IP_CT_DIR_MAX;
 	switch (priv->key) {
 	case NFT_CT_DIRECTION:
 		if (tb[NFTA_CT_DIRECTION] != NULL)
@@ -333,11 +346,13 @@ static int nft_ct_get_init(const struct nft_ctx *ctx,
 	case NFT_CT_BYTES:
 	case NFT_CT_PKTS:
 	case NFT_CT_AVGPKT:
-		/* no direction? return sum of original + reply */
-		if (tb[NFTA_CT_DIRECTION] == NULL)
-			priv->dir = IP_CT_DIR_MAX;
 		len = sizeof(u64);
 		break;
+#ifdef CONFIG_NF_CONNTRACK_ZONES
+	case NFT_CT_ZONE:
+		len = sizeof(u16);
+		break;
+#endif
 	default:
 		return -EOPNOTSUPP;
 	}
@@ -465,6 +480,7 @@ static int nft_ct_get_dump(struct sk_buff *skb, const struct nft_expr *expr)
 	case NFT_CT_BYTES:
 	case NFT_CT_PKTS:
 	case NFT_CT_AVGPKT:
+	case NFT_CT_ZONE:
 		if (priv->dir < IP_CT_DIR_MAX &&
 		    nla_put_u8(skb, NFTA_CT_DIRECTION, priv->dir))
 			goto nla_put_failure;
-- 
2.10.2


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH nf-next 2/9] netfilter: nft_ct: prepare for key-dependent error unwind
  2017-02-03 12:35 [PATCH -next 0/9] nftables: add zone support to ct statement Florian Westphal
  2017-02-03 12:35 ` [PATCH nf-next 1/9] netfilter: nft_ct: add zone id get support Florian Westphal
@ 2017-02-03 12:35 ` Florian Westphal
  2017-02-08  9:29   ` Pablo Neira Ayuso
  2017-02-03 12:35 ` [PATCH nf-next 3/9] netfilter: nft_ct: add zone id set support Florian Westphal
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 19+ messages in thread
From: Florian Westphal @ 2017-02-03 12:35 UTC (permalink / raw)
  To: netfilter-devel; +Cc: Florian Westphal

Next patch will add ZONE_ID set support which will need similar
error unwind (put operation) as conntrack labels.

Prepare for this: remove the 'label_got' boolean in favor
of a switch statement that can be extended in next patch.

As we already have that in the set_destroy function place that in
a separate function and call it from the set init function.

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 net/netfilter/nft_ct.c | 29 +++++++++++++++--------------
 1 file changed, 15 insertions(+), 14 deletions(-)

diff --git a/net/netfilter/nft_ct.c b/net/netfilter/nft_ct.c
index 5bd4cdfdcda5..2d82df2737da 100644
--- a/net/netfilter/nft_ct.c
+++ b/net/netfilter/nft_ct.c
@@ -386,12 +386,24 @@ static int nft_ct_get_init(const struct nft_ctx *ctx,
 	return 0;
 }
 
+static void __nft_ct_set_destroy(const struct nft_ctx *ctx, struct nft_ct *priv)
+{
+	switch (priv->key) {
+#ifdef CONFIG_NF_CONNTRACK_LABELS
+	case NFT_CT_LABELS:
+		nf_connlabels_put(ctx->net);
+		break;
+#endif
+	default:
+		break;
+	}
+}
+
 static int nft_ct_set_init(const struct nft_ctx *ctx,
 			   const struct nft_expr *expr,
 			   const struct nlattr * const tb[])
 {
 	struct nft_ct *priv = nft_expr_priv(expr);
-	bool label_got = false;
 	unsigned int len;
 	int err;
 
@@ -412,7 +424,6 @@ static int nft_ct_set_init(const struct nft_ctx *ctx,
 		err = nf_connlabels_get(ctx->net, (len * BITS_PER_BYTE) - 1);
 		if (err)
 			return err;
-		label_got = true;
 		break;
 #endif
 	default:
@@ -431,8 +442,7 @@ static int nft_ct_set_init(const struct nft_ctx *ctx,
 	return 0;
 
 err1:
-	if (label_got)
-		nf_connlabels_put(ctx->net);
+	__nft_ct_set_destroy(ctx, priv);
 	return err;
 }
 
@@ -447,16 +457,7 @@ static void nft_ct_set_destroy(const struct nft_ctx *ctx,
 {
 	struct nft_ct *priv = nft_expr_priv(expr);
 
-	switch (priv->key) {
-#ifdef CONFIG_NF_CONNTRACK_LABELS
-	case NFT_CT_LABELS:
-		nf_connlabels_put(ctx->net);
-		break;
-#endif
-	default:
-		break;
-	}
-
+	__nft_ct_set_destroy(ctx, priv);
 	nft_ct_netns_put(ctx->net, ctx->afi->family);
 }
 
-- 
2.10.2


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH nf-next 3/9] netfilter: nft_ct: add zone id set support
  2017-02-03 12:35 [PATCH -next 0/9] nftables: add zone support to ct statement Florian Westphal
  2017-02-03 12:35 ` [PATCH nf-next 1/9] netfilter: nft_ct: add zone id get support Florian Westphal
  2017-02-03 12:35 ` [PATCH nf-next 2/9] netfilter: nft_ct: prepare for key-dependent error unwind Florian Westphal
@ 2017-02-03 12:35 ` Florian Westphal
  2017-02-08  9:29   ` Pablo Neira Ayuso
  2017-02-03 12:35 ` [PATCH libnftnl 4/9] src: ct: add zone support Florian Westphal
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 19+ messages in thread
From: Florian Westphal @ 2017-02-03 12:35 UTC (permalink / raw)
  To: netfilter-devel; +Cc: Florian Westphal

zones allow tracking multiple connections sharing identical tuples,
this is needed e.g. when tracking distinct vlans with overlapping ip
addresses (conntrack is l2 agnostic).

Thus the zone has to be set before the packet is picked up by the
connection tracker.  This is done by means of 'conntrack templates' which
are conntrack structures used solely to pass this info from one netfilter
hook to the next.

The iptables CT target instantiates these connection tracking templates
once per rule, i.e. the template is fixed/tied to particular zone, can
be read-only and therefore be re-used by as many skbs simultaneously as
needed.

We can't follow this model because we want to take the zone id from
an sreg at rule eval time so we could e.g. fill in the zone id from
the packets vlan id or a e.g. nftables key : value maps.

To avoid cost of per packet alloc/free of the template, use a percpu
template 'scratch' object and use the refcount to detect the (unlikely)
case where the template is still attached to another skb (i.e., previous
skb was nfqueued ...).

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 net/netfilter/nft_ct.c | 144 ++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 143 insertions(+), 1 deletion(-)

diff --git a/net/netfilter/nft_ct.c b/net/netfilter/nft_ct.c
index 2d82df2737da..c6b8022c0e47 100644
--- a/net/netfilter/nft_ct.c
+++ b/net/netfilter/nft_ct.c
@@ -32,6 +32,11 @@ struct nft_ct {
 	};
 };
 
+#ifdef CONFIG_NF_CONNTRACK_ZONES
+static DEFINE_PER_CPU(struct nf_conn *, nft_ct_pcpu_template);
+static unsigned int nft_ct_pcpu_template_refcnt __read_mostly;
+#endif
+
 static u64 nft_ct_get_eval_counter(const struct nf_conn_counter *c,
 				   enum nft_ct_keys k,
 				   enum ip_conntrack_dir d)
@@ -191,6 +196,53 @@ static void nft_ct_get_eval(const struct nft_expr *expr,
 	regs->verdict.code = NFT_BREAK;
 }
 
+#ifdef CONFIG_NF_CONNTRACK_ZONES
+static void nft_ct_set_zone_eval(const struct nft_expr *expr,
+				 struct nft_regs *regs,
+				 const struct nft_pktinfo *pkt)
+{
+	struct nf_conntrack_zone zone = { .dir = NF_CT_DEFAULT_ZONE_DIR };
+	const struct nft_ct *priv = nft_expr_priv(expr);
+	struct sk_buff *skb = pkt->skb;
+	enum ip_conntrack_info ctinfo;
+	u16 value = regs->data[priv->sreg];
+	struct nf_conn *ct;
+
+	ct = nf_ct_get(skb, &ctinfo);
+	if (ct) /* already tracked */
+		return;
+
+	zone.id = value;
+
+	switch (priv->dir) {
+	case IP_CT_DIR_ORIGINAL:
+		zone.dir = NF_CT_ZONE_DIR_ORIG;
+		break;
+	case IP_CT_DIR_REPLY:
+		zone.dir = NF_CT_ZONE_DIR_REPL;
+		break;
+	default:
+		break;
+	}
+
+	ct = this_cpu_read(nft_ct_pcpu_template);
+
+	if (likely(atomic_read(&ct->ct_general.use) == 1)) {
+		nf_ct_zone_add(ct, &zone);
+	} else {
+		/* previous skb got queued to userspace */
+		ct = nf_ct_tmpl_alloc(nft_net(pkt), &zone, GFP_ATOMIC);
+		if (!ct) {
+			regs->verdict.code = NF_DROP;
+			return;
+		}
+	}
+
+	atomic_inc(&ct->ct_general.use);
+	nf_ct_set(skb, ct, IP_CT_NEW);
+}
+#endif
+
 static void nft_ct_set_eval(const struct nft_expr *expr,
 			    struct nft_regs *regs,
 			    const struct nft_pktinfo *pkt)
@@ -269,6 +321,45 @@ static void nft_ct_netns_put(struct net *net, uint8_t family)
 		nf_ct_netns_put(net, family);
 }
 
+#ifdef CONFIG_NF_CONNTRACK_ZONES
+static void nft_ct_tmpl_put_pcpu(void)
+{
+	struct nf_conn *ct;
+	int cpu;
+
+	for_each_possible_cpu(cpu) {
+		ct = per_cpu(nft_ct_pcpu_template, cpu);
+		if (!ct)
+			break;
+		nf_ct_put(ct);
+		per_cpu(nft_ct_pcpu_template, cpu) = NULL;
+	}
+}
+
+static bool nft_ct_tmpl_alloc_pcpu(void)
+{
+	struct nf_conntrack_zone zone = { .id = 0 };
+	struct nf_conn *tmp;
+	int cpu;
+
+	if (nft_ct_pcpu_template_refcnt)
+		return true;
+
+	for_each_possible_cpu(cpu) {
+		tmp = nf_ct_tmpl_alloc(&init_net, &zone, GFP_KERNEL);
+		if (!tmp) {
+			nft_ct_tmpl_put_pcpu();
+			return false;
+		}
+
+		atomic_set(&tmp->ct_general.use, 1);
+		per_cpu(nft_ct_pcpu_template, cpu) = tmp;
+	}
+
+	return true;
+}
+#endif
+
 static int nft_ct_get_init(const struct nft_ctx *ctx,
 			   const struct nft_expr *expr,
 			   const struct nlattr * const tb[])
@@ -394,6 +485,11 @@ static void __nft_ct_set_destroy(const struct nft_ctx *ctx, struct nft_ct *priv)
 		nf_connlabels_put(ctx->net);
 		break;
 #endif
+#ifdef CONFIG_NF_CONNTRACK_ZONES
+	case NFT_CT_ZONE:
+		if (--nft_ct_pcpu_template_refcnt == 0)
+			nft_ct_tmpl_put_pcpu();
+#endif
 	default:
 		break;
 	}
@@ -407,6 +503,7 @@ static int nft_ct_set_init(const struct nft_ctx *ctx,
 	unsigned int len;
 	int err;
 
+	priv->dir = IP_CT_DIR_MAX;
 	priv->key = ntohl(nla_get_be32(tb[NFTA_CT_KEY]));
 	switch (priv->key) {
 #ifdef CONFIG_NF_CONNTRACK_MARK
@@ -426,10 +523,28 @@ static int nft_ct_set_init(const struct nft_ctx *ctx,
 			return err;
 		break;
 #endif
+#ifdef CONFIG_NF_CONNTRACK_ZONES
+	case NFT_CT_ZONE:
+		if (!nft_ct_tmpl_alloc_pcpu())
+			return -ENOMEM;
+		nft_ct_pcpu_template_refcnt++;
+		break;
+#endif
 	default:
 		return -EOPNOTSUPP;
 	}
 
+	if (tb[NFTA_CT_DIRECTION]) {
+		priv->dir = nla_get_u8(tb[NFTA_CT_DIRECTION]);
+		switch (priv->dir) {
+		case IP_CT_DIR_ORIGINAL:
+		case IP_CT_DIR_REPLY:
+			break;
+		default:
+			return -EINVAL;
+		}
+	}
+
 	priv->sreg = nft_parse_register(tb[NFTA_CT_SREG]);
 	err = nft_validate_register_load(priv->sreg, len);
 	if (err < 0)
@@ -504,6 +619,17 @@ static int nft_ct_set_dump(struct sk_buff *skb, const struct nft_expr *expr)
 		goto nla_put_failure;
 	if (nla_put_be32(skb, NFTA_CT_KEY, htonl(priv->key)))
 		goto nla_put_failure;
+
+	switch (priv->key) {
+	case NFT_CT_ZONE:
+		if (priv->dir < IP_CT_DIR_MAX &&
+		    nla_put_u8(skb, NFTA_CT_DIRECTION, priv->dir))
+			goto nla_put_failure;
+		break;
+	default:
+		break;
+	}
+
 	return 0;
 
 nla_put_failure:
@@ -529,6 +655,17 @@ static const struct nft_expr_ops nft_ct_set_ops = {
 	.dump		= nft_ct_set_dump,
 };
 
+#ifdef CONFIG_NF_CONNTRACK_ZONES
+static const struct nft_expr_ops nft_ct_set_zone_ops = {
+	.type		= &nft_ct_type,
+	.size		= NFT_EXPR_SIZE(sizeof(struct nft_ct)),
+	.eval		= nft_ct_set_zone_eval,
+	.init		= nft_ct_set_init,
+	.destroy	= nft_ct_set_destroy,
+	.dump		= nft_ct_set_dump,
+};
+#endif
+
 static const struct nft_expr_ops *
 nft_ct_select_ops(const struct nft_ctx *ctx,
 		    const struct nlattr * const tb[])
@@ -542,8 +679,13 @@ nft_ct_select_ops(const struct nft_ctx *ctx,
 	if (tb[NFTA_CT_DREG])
 		return &nft_ct_get_ops;
 
-	if (tb[NFTA_CT_SREG])
+	if (tb[NFTA_CT_SREG]) {
+#ifdef CONFIG_NF_CONNTRACK_ZONES
+		if (nla_get_be32(tb[NFTA_CT_KEY]) == htonl(NFT_CT_ZONE))
+			return &nft_ct_set_zone_ops;
+#endif
 		return &nft_ct_set_ops;
+	}
 
 	return ERR_PTR(-EINVAL);
 }
-- 
2.10.2


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH libnftnl 4/9] src: ct: add zone support
  2017-02-03 12:35 [PATCH -next 0/9] nftables: add zone support to ct statement Florian Westphal
                   ` (2 preceding siblings ...)
  2017-02-03 12:35 ` [PATCH nf-next 3/9] netfilter: nft_ct: add zone id set support Florian Westphal
@ 2017-02-03 12:35 ` Florian Westphal
  2017-02-19 19:22   ` Pablo Neira Ayuso
  2017-02-03 12:35 ` [PATCH nftables 5/9] src: add host byte order integer type Florian Westphal
                   ` (4 subsequent siblings)
  8 siblings, 1 reply; 19+ messages in thread
From: Florian Westphal @ 2017-02-03 12:35 UTC (permalink / raw)
  To: netfilter-devel; +Cc: Florian Westphal

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 include/linux/netfilter/nf_tables.h | 2 ++
 src/expr/ct.c                       | 3 ++-
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/include/linux/netfilter/nf_tables.h b/include/linux/netfilter/nf_tables.h
index b00a05d1ee56..b972e72623c2 100644
--- a/include/linux/netfilter/nf_tables.h
+++ b/include/linux/netfilter/nf_tables.h
@@ -864,6 +864,7 @@ enum nft_rt_attributes {
  * @NFT_CT_PKTS: conntrack packets
  * @NFT_CT_BYTES: conntrack bytes
  * @NFT_CT_AVGPKT: conntrack average bytes per packet
+ * @NFT_CT_ZONE: conntrack zone
  */
 enum nft_ct_keys {
 	NFT_CT_STATE,
@@ -883,6 +884,7 @@ enum nft_ct_keys {
 	NFT_CT_PKTS,
 	NFT_CT_BYTES,
 	NFT_CT_AVGPKT,
+	NFT_CT_ZONE,
 };
 
 /**
diff --git a/src/expr/ct.c b/src/expr/ct.c
index d3d352e9f959..cdd08e95c86c 100644
--- a/src/expr/ct.c
+++ b/src/expr/ct.c
@@ -32,7 +32,7 @@ struct nftnl_expr_ct {
 #define IP_CT_DIR_REPLY		1
 
 #ifndef NFT_CT_MAX
-#define NFT_CT_MAX (NFT_CT_AVGPKT + 1)
+#define NFT_CT_MAX (NFT_CT_ZONE + 1)
 #endif
 
 static int
@@ -170,6 +170,7 @@ static const char *ctkey2str_array[NFT_CT_MAX] = {
 	[NFT_CT_PKTS]		= "packets",
 	[NFT_CT_BYTES]		= "bytes",
 	[NFT_CT_AVGPKT]		= "avgpkt",
+	[NFT_CT_ZONE]		= "zone",
 };
 
 static const char *ctkey2str(uint32_t ctkey)
-- 
2.10.2


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH nftables 5/9] src: add host byte order integer type
  2017-02-03 12:35 [PATCH -next 0/9] nftables: add zone support to ct statement Florian Westphal
                   ` (3 preceding siblings ...)
  2017-02-03 12:35 ` [PATCH libnftnl 4/9] src: ct: add zone support Florian Westphal
@ 2017-02-03 12:35 ` Florian Westphal
  2017-02-06 17:31   ` Pablo Neira Ayuso
  2017-02-03 12:35 ` [PATCH nftables 6/9] src: add conntrack zone support Florian Westphal
                   ` (3 subsequent siblings)
  8 siblings, 1 reply; 19+ messages in thread
From: Florian Westphal @ 2017-02-03 12:35 UTC (permalink / raw)
  To: netfilter-devel; +Cc: Florian Westphal

This is needed once we add support to set a zone, as in

ct zone set 42

Using integer_type makes nft use big-endian representation of the zone id
instead of the required host byte order.

When using 'ct zone 1', things will work because the (implicit) relational
operation makes sure that the left and right sides have same byte order.

In the statement case the lack of relop means we either need to convert
ourselves (the ct template contains endianess info), or use a dedicated type
(the latter is the reason why setting a mark will 'just work' since the
 mark type takes care of it).

The dedicated type has the advantage that it also works when maps are used:

ct zone set mark map { 1 : 10, 2 : 20, 3 : 30 }

... which is not easy to do with current map/set code, its endianess
settings rely on dtype->byteorder (i.e., it will always set BIG_ENDIAN
when we'd use integer_type for the zone).

Using evaluation context seems like a nightmare because several
places during eval steps can re-set this information, and propagating
the template info means to pollute generic code with something specific
to ct.

It seems like a future removal of all .byteorder members in the templates
in favor of using appropriate types might be a good idea.

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 include/datatype.h |  2 ++
 src/datatype.c     | 10 ++++++++++
 2 files changed, 12 insertions(+)

diff --git a/include/datatype.h b/include/datatype.h
index 9f127f2954e3..8c1c827253be 100644
--- a/include/datatype.h
+++ b/include/datatype.h
@@ -82,6 +82,7 @@ enum datatypes {
 	TYPE_DSCP,
 	TYPE_ECN,
 	TYPE_FIB_ADDR,
+	TYPE_U32,
 	__TYPE_MAX
 };
 #define TYPE_MAX		(__TYPE_MAX - 1)
@@ -231,6 +232,7 @@ extern const struct datatype icmp_code_type;
 extern const struct datatype icmpv6_code_type;
 extern const struct datatype icmpx_code_type;
 extern const struct datatype time_type;
+extern const struct datatype u32_type;
 
 extern const struct datatype *concat_type_alloc(uint32_t type);
 extern void concat_type_destroy(const struct datatype *dtype);
diff --git a/src/datatype.c b/src/datatype.c
index 1518606a3f89..cab42d47f0f0 100644
--- a/src/datatype.c
+++ b/src/datatype.c
@@ -48,6 +48,7 @@ static const struct datatype *datatypes[TYPE_MAX + 1] = {
 	[TYPE_ICMP_CODE]	= &icmp_code_type,
 	[TYPE_ICMPV6_CODE]	= &icmpv6_code_type,
 	[TYPE_ICMPX_CODE]	= &icmpx_code_type,
+	[TYPE_U32]		= &u32_type,
 };
 
 void datatype_register(const struct datatype *dtype)
@@ -1057,3 +1058,12 @@ struct error_record *rate_parse(const struct location *loc, const char *str,
 
 	return NULL;
 }
+
+const struct datatype u32_type = {
+	.type		= TYPE_U32,
+	.name		= "u32",
+	.desc		= "32bit host endian integer",
+	.size		= 4 * BITS_PER_BYTE,
+	.byteorder	= BYTEORDER_HOST_ENDIAN,
+	.basetype	= &integer_type,
+};
-- 
2.10.2


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH nftables 6/9] src: add conntrack zone support
  2017-02-03 12:35 [PATCH -next 0/9] nftables: add zone support to ct statement Florian Westphal
                   ` (4 preceding siblings ...)
  2017-02-03 12:35 ` [PATCH nftables 5/9] src: add host byte order integer type Florian Westphal
@ 2017-02-03 12:35 ` Florian Westphal
  2017-02-03 12:35 ` [PATCH nftables 7/9] ct: refactor print function so it can be re-used for ct statement Florian Westphal
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 19+ messages in thread
From: Florian Westphal @ 2017-02-03 12:35 UTC (permalink / raw)
  To: netfilter-devel; +Cc: Florian Westphal

This enables zone get/set support.

As the zone can be optionally tied to a direction as well we need a new
token for this (unless we turn reply/original into tokens in which case
we could handle zone via STRING).

There was some discussion on how zone set support should be handled,
especially 'zone set 1'.

There are several issues to consider:

1. its not possible to change a zone 'later on', any given
conntrack flow has exactly one zone for its entire lifetime.

2. to create conntracks in a given zone, the zone therefore has to be
assigned *before* the packet gets picked up by conntrack (so that lookup
finds the correct existing flow or the flow is created with the desired
zone id).  In iptables, this is enforced because zones are assigned with
CT target and this is restricted to the 'raw' table in iptables, which
runs after defragmentation but before connection tracking.

3. Thus, in nftables the 'ct zone set' rule needs to hook before
conntrack too, e.g. via

 table raw {
  chain pre {
   type filter hook prerouting priority -300;
   iif eth3 ct zone set 23
  }
  chain out {
   type filter hook output priority -300;
   oif eth3 ct zone set 23
  }
 }

... but this is not enforced.

There were two alternatives to better document this.
One was to use an explicit 'template' keyword:
  nft ... template zone set 23

... but 'connection tracking templates' are a kernel detail
that users should not and need not know about.

The other one was to use the meta keyword instead since
we're (from a practical point of view) assigning the zone to
the packet, not the conntrack:

 nft ... meta zone set 23

However, next patch also supports 'directional' zones, and

 nft ... meta original zone 23

makes no sense because 'direction' refers to a direction as understood
by the connection tracker.

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 doc/nft.xml                         | 10 +++++++++-
 include/linux/netfilter/nf_tables.h |  1 +
 src/ct.c                            |  2 ++
 src/parser_bison.y                  | 10 ++++++----
 src/scanner.l                       |  1 +
 5 files changed, 19 insertions(+), 5 deletions(-)

diff --git a/doc/nft.xml b/doc/nft.xml
index 78e112f3974b..0a81728789bf 100644
--- a/doc/nft.xml
+++ b/doc/nft.xml
@@ -2126,7 +2126,8 @@ inet filter meta nfproto ipv6 output rt nexthop fd00::1
 				direction before the conntrack key, others must be used directly because they are direction agnostic.
 				The <command>packets</command>, <command>bytes</command> and <command>avgpkt</command> keywords can be
 				used with or without a direction. If the direction is omitted, the sum of the original and the reply
-				direction is returned.
+				direction is returned.  The same is true for the <command>zone</command>, if a direction is given, the zone
+				is only matched if the zone id is tied to the given direction.
 			</para>
 			<para>
 				<cmdsynopsis>
@@ -2144,6 +2145,7 @@ inet filter meta nfproto ipv6 output rt nexthop fd00::1
 						<arg>bytes</arg>
 						<arg>packets</arg>
 						<arg>avgpkt</arg>
+						<arg>zone</arg>
 					</group>
 				</cmdsynopsis>
 				<cmdsynopsis>
@@ -2162,6 +2164,7 @@ inet filter meta nfproto ipv6 output rt nexthop fd00::1
 						<arg>bytes</arg>
 						<arg>packets</arg>
 						<arg>avgpkt</arg>
+						<arg>zone</arg>
 					</group>
 				</cmdsynopsis>
 			</para>
@@ -2260,6 +2263,11 @@ inet filter meta nfproto ipv6 output rt nexthop fd00::1
 								<entry>average bytes per packet, see description for <command>packets</command> keyword</entry>
 								<entry>integer (64 bit)</entry>
 							</row>
+							<row>
+								<entry>zone</entry>
+								<entry>conntrack zone</entry>
+								<entry>integer (16 bit)</entry>
+							</row>
 						</tbody>
 					</tgroup>
 				</table>
diff --git a/include/linux/netfilter/nf_tables.h b/include/linux/netfilter/nf_tables.h
index b00a05d1ee56..fc0ed47d974d 100644
--- a/include/linux/netfilter/nf_tables.h
+++ b/include/linux/netfilter/nf_tables.h
@@ -883,6 +883,7 @@ enum nft_ct_keys {
 	NFT_CT_PKTS,
 	NFT_CT_BYTES,
 	NFT_CT_AVGPKT,
+	NFT_CT_ZONE,
 };
 
 /**
diff --git a/src/ct.c b/src/ct.c
index 31c7a4b1beda..dffa0e5fa44a 100644
--- a/src/ct.c
+++ b/src/ct.c
@@ -234,6 +234,8 @@ static const struct ct_template ct_templates[] = {
 					      BYTEORDER_HOST_ENDIAN, 64),
 	[NFT_CT_AVGPKT]		= CT_TEMPLATE("avgpkt", &integer_type,
 					      BYTEORDER_HOST_ENDIAN, 64),
+	[NFT_CT_ZONE]		= CT_TEMPLATE("zone", &u32_type,
+					      BYTEORDER_HOST_ENDIAN, 16),
 };
 
 static void ct_expr_print(const struct expr *expr)
diff --git a/src/parser_bison.y b/src/parser_bison.y
index d543e3ea2515..14d924810f9a 100644
--- a/src/parser_bison.y
+++ b/src/parser_bison.y
@@ -357,6 +357,7 @@ static void location_update(struct location *loc, struct location *rhs, int n)
 %token L3PROTOCOL		"l3proto"
 %token PROTO_SRC		"proto-src"
 %token PROTO_DST		"proto-dst"
+%token ZONE			"zone"
 
 %token COUNTER			"counter"
 %token NAME			"name"
@@ -610,7 +611,7 @@ static void location_update(struct location *loc, struct location *rhs, int n)
 
 %type <expr>			ct_expr
 %destructor { expr_free($$); }	ct_expr
-%type <val>			ct_key		ct_key_dir	ct_key_counters
+%type <val>			ct_key		ct_key_dir	ct_key_dir_optional
 
 %type <expr>			fib_expr
 %destructor { expr_free($$); }	fib_expr
@@ -2949,7 +2950,7 @@ ct_expr			: 	CT	ct_key
 ct_key			:	L3PROTOCOL	{ $$ = NFT_CT_L3PROTOCOL; }
 			|	PROTOCOL	{ $$ = NFT_CT_PROTOCOL; }
 			|	MARK		{ $$ = NFT_CT_MARK; }
-			|	ct_key_counters
+			|	ct_key_dir_optional
 			;
 ct_key_dir		:	SADDR		{ $$ = NFT_CT_SRC; }
 			|	DADDR		{ $$ = NFT_CT_DST; }
@@ -2957,12 +2958,13 @@ ct_key_dir		:	SADDR		{ $$ = NFT_CT_SRC; }
 			|	PROTOCOL	{ $$ = NFT_CT_PROTOCOL; }
 			|	PROTO_SRC	{ $$ = NFT_CT_PROTO_SRC; }
 			|	PROTO_DST	{ $$ = NFT_CT_PROTO_DST; }
-			|	ct_key_counters
+			|	ct_key_dir_optional
 			;
 
-ct_key_counters		:	BYTES		{ $$ = NFT_CT_BYTES; }
+ct_key_dir_optional	:	BYTES		{ $$ = NFT_CT_BYTES; }
 			|	PACKETS		{ $$ = NFT_CT_PKTS; }
 			|	AVGPKT		{ $$ = NFT_CT_AVGPKT; }
+			|	ZONE		{ $$ = NFT_CT_ZONE; }
 			;
 
 ct_stmt			:	CT	ct_key		SET	expr
diff --git a/src/scanner.l b/src/scanner.l
index d0d25ea94600..bb7e83c3cfd7 100644
--- a/src/scanner.l
+++ b/src/scanner.l
@@ -460,6 +460,7 @@ addrstring	({macaddr}|{ip4addr}|{ip6addr})
 "l3proto"		{ return L3PROTOCOL; }
 "proto-src"		{ return PROTO_SRC; }
 "proto-dst"		{ return PROTO_DST; }
+"zone"			{ return ZONE; }
 
 "numgen"		{ return NUMGEN; }
 "inc"			{ return INC; }
-- 
2.10.2


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH nftables 7/9] ct: refactor print function so it can be re-used for ct statement
  2017-02-03 12:35 [PATCH -next 0/9] nftables: add zone support to ct statement Florian Westphal
                   ` (5 preceding siblings ...)
  2017-02-03 12:35 ` [PATCH nftables 6/9] src: add conntrack zone support Florian Westphal
@ 2017-02-03 12:35 ` Florian Westphal
  2017-02-03 12:35 ` [PATCH nftables 8/9] src: support zone set statement with optional direction Florian Westphal
  2017-02-03 12:35 ` [PATCH nftables 9/9] tests: add test entries for conntrack zones Florian Westphal
  8 siblings, 0 replies; 19+ messages in thread
From: Florian Westphal @ 2017-02-03 12:35 UTC (permalink / raw)
  To: netfilter-devel; +Cc: Florian Westphal

Once directional zone support is added we also need to print the
direction of the statement, so factor the common code to re-use
this helper from the statement print function.

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 src/ct.c | 13 +++++++++----
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/src/ct.c b/src/ct.c
index dffa0e5fa44a..7e09c5b246b2 100644
--- a/src/ct.c
+++ b/src/ct.c
@@ -238,22 +238,27 @@ static const struct ct_template ct_templates[] = {
 					      BYTEORDER_HOST_ENDIAN, 16),
 };
 
-static void ct_expr_print(const struct expr *expr)
+static void ct_print(enum nft_ct_keys key, int8_t dir)
 {
 	const struct symbolic_constant *s;
 
 	printf("ct ");
-	if (expr->ct.direction < 0)
+	if (dir < 0)
 		goto done;
 
 	for (s = ct_dir_tbl.symbols; s->identifier != NULL; s++) {
-		if (expr->ct.direction == (int) s->value) {
+		if (dir == (int)s->value) {
 			printf("%s ", s->identifier);
 			break;
 		}
 	}
  done:
-	printf("%s", ct_templates[expr->ct.key].token);
+	printf("%s", ct_templates[key].token);
+}
+
+static void ct_expr_print(const struct expr *expr)
+{
+	ct_print(expr->ct.key, expr->ct.direction);
 }
 
 static bool ct_expr_cmp(const struct expr *e1, const struct expr *e2)
-- 
2.10.2


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH nftables 8/9] src: support zone set statement with optional direction
  2017-02-03 12:35 [PATCH -next 0/9] nftables: add zone support to ct statement Florian Westphal
                   ` (6 preceding siblings ...)
  2017-02-03 12:35 ` [PATCH nftables 7/9] ct: refactor print function so it can be re-used for ct statement Florian Westphal
@ 2017-02-03 12:35 ` Florian Westphal
  2017-02-03 12:35 ` [PATCH nftables 9/9] tests: add test entries for conntrack zones Florian Westphal
  8 siblings, 0 replies; 19+ messages in thread
From: Florian Westphal @ 2017-02-03 12:35 UTC (permalink / raw)
  To: netfilter-devel; +Cc: Florian Westphal

nft automatically understands 'ct zone set 1' but when a direction is
specified too we get a parser error since they are currently only
allowed for plain ct expressions.

This permits the existing syntax ('ct original zone') for all tokens with
an optional direction also for set statements.

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 include/statement.h       |  2 ++
 src/ct.c                  |  7 +++++--
 src/netlink_delinearize.c |  6 +++++-
 src/netlink_linearize.c   |  4 ++++
 src/parser_bison.y        | 17 +++++++++++++++--
 5 files changed, 31 insertions(+), 5 deletions(-)

diff --git a/include/statement.h b/include/statement.h
index 8f874c881bd9..317d53e26140 100644
--- a/include/statement.h
+++ b/include/statement.h
@@ -127,10 +127,12 @@ struct ct_stmt {
 	enum nft_ct_keys		key;
 	const struct ct_template	*tmpl;
 	struct expr			*expr;
+	int8_t				direction;
 };
 
 extern struct stmt *ct_stmt_alloc(const struct location *loc,
 				  enum nft_ct_keys key,
+				  int8_t direction,
 				  struct expr *expr);
 struct dup_stmt {
 	struct expr		*to;
diff --git a/src/ct.c b/src/ct.c
index 7e09c5b246b2..2edbdacfca01 100644
--- a/src/ct.c
+++ b/src/ct.c
@@ -404,7 +404,8 @@ void ct_expr_update_type(struct proto_ctx *ctx, struct expr *expr)
 
 static void ct_stmt_print(const struct stmt *stmt)
 {
-	printf("ct %s set ", ct_templates[stmt->ct.key].token);
+	ct_print(stmt->ct.key, stmt->ct.direction);
+	printf(" set ");
 	expr_print(stmt->ct.expr);
 }
 
@@ -415,7 +416,7 @@ static const struct stmt_ops ct_stmt_ops = {
 };
 
 struct stmt *ct_stmt_alloc(const struct location *loc, enum nft_ct_keys key,
-			     struct expr *expr)
+			   int8_t direction, struct expr *expr)
 {
 	struct stmt *stmt;
 
@@ -423,6 +424,8 @@ struct stmt *ct_stmt_alloc(const struct location *loc, enum nft_ct_keys key,
 	stmt->ct.key	= key;
 	stmt->ct.tmpl	= &ct_templates[key];
 	stmt->ct.expr	= expr;
+	stmt->ct.direction = direction;
+
 	return stmt;
 }
 
diff --git a/src/netlink_delinearize.c b/src/netlink_delinearize.c
index 48968442d9bc..fe3c865cab54 100644
--- a/src/netlink_delinearize.c
+++ b/src/netlink_delinearize.c
@@ -657,6 +657,7 @@ static void netlink_parse_ct_stmt(struct netlink_parse_ctx *ctx,
 	uint32_t key;
 	struct stmt *stmt;
 	struct expr *expr;
+	int8_t dir = -1;
 
 	sreg = netlink_parse_register(nle, NFTNL_EXPR_CT_SREG);
 	expr = netlink_get_register(ctx, loc, sreg);
@@ -664,8 +665,11 @@ static void netlink_parse_ct_stmt(struct netlink_parse_ctx *ctx,
 		return netlink_error(ctx, loc,
 				     "ct statement has no expression");
 
+	if (nftnl_expr_is_set(nle, NFTNL_EXPR_CT_DIR))
+		dir = nftnl_expr_get_u8(nle, NFTNL_EXPR_CT_DIR);
+
 	key  = nftnl_expr_get_u32(nle, NFTNL_EXPR_CT_KEY);
-	stmt = ct_stmt_alloc(loc, key, expr);
+	stmt = ct_stmt_alloc(loc, key, dir, expr);
 	expr_set_type(expr, stmt->ct.tmpl->dtype, stmt->ct.tmpl->byteorder);
 
 	ctx->stmt = stmt;
diff --git a/src/netlink_linearize.c b/src/netlink_linearize.c
index 5030135cd5d5..9979b7867715 100644
--- a/src/netlink_linearize.c
+++ b/src/netlink_linearize.c
@@ -1150,6 +1150,10 @@ static void netlink_gen_ct_stmt(struct netlink_linearize_ctx *ctx,
 	nle = alloc_nft_expr("ct");
 	netlink_put_register(nle, NFTNL_EXPR_CT_SREG, sreg);
 	nftnl_expr_set_u32(nle, NFTNL_EXPR_CT_KEY, stmt->ct.key);
+	if (stmt->ct.direction >= 0)
+		nftnl_expr_set_u8(nle, NFTNL_EXPR_CT_DIR,
+				  stmt->ct.direction);
+
 	nftnl_rule_add_expr(ctx->nlr, nle);
 }
 
diff --git a/src/parser_bison.y b/src/parser_bison.y
index 14d924810f9a..61ecf08c747e 100644
--- a/src/parser_bison.y
+++ b/src/parser_bison.y
@@ -2969,7 +2969,7 @@ ct_key_dir_optional	:	BYTES		{ $$ = NFT_CT_BYTES; }
 
 ct_stmt			:	CT	ct_key		SET	expr
 			{
-				$$ = ct_stmt_alloc(&@$, $2, $4);
+				$$ = ct_stmt_alloc(&@$, $2, -1, $4);
 			}
 			|	CT	STRING		SET	expr
 			{
@@ -2982,7 +2982,20 @@ ct_stmt			:	CT	ct_key		SET	expr
 					YYERROR;
 				}
 
-				$$ = ct_stmt_alloc(&@$, key, $4);
+				$$ = ct_stmt_alloc(&@$, key, -1, $4);
+			}
+			|	CT	STRING	ct_key_dir_optional SET	expr
+			{
+				struct error_record *erec;
+				int8_t direction;
+
+				erec = ct_dir_parse(&@$, $2, &direction);
+				if (erec != NULL) {
+					erec_queue(erec, state->msgs);
+					YYERROR;
+				}
+
+				$$ = ct_stmt_alloc(&@$, $3, direction, $5);
 			}
 			;
 
-- 
2.10.2


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH nftables 9/9] tests: add test entries for conntrack zones
  2017-02-03 12:35 [PATCH -next 0/9] nftables: add zone support to ct statement Florian Westphal
                   ` (7 preceding siblings ...)
  2017-02-03 12:35 ` [PATCH nftables 8/9] src: support zone set statement with optional direction Florian Westphal
@ 2017-02-03 12:35 ` Florian Westphal
  8 siblings, 0 replies; 19+ messages in thread
From: Florian Westphal @ 2017-02-03 12:35 UTC (permalink / raw)
  To: netfilter-devel; +Cc: Florian Westphal

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 tests/py/any/ct.t         | 13 +++++++++++++
 tests/py/any/ct.t.payload | 44 ++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 57 insertions(+)

diff --git a/tests/py/any/ct.t b/tests/py/any/ct.t
index 2cfbfe13ccd2..6f32d29c0c40 100644
--- a/tests/py/any/ct.t
+++ b/tests/py/any/ct.t
@@ -100,6 +100,19 @@ ct label 127;ok
 ct label set 127;ok
 ct label 128;fail
 
+ct zone 0;ok
+ct zone 23;ok
+ct zone 65536;fail
+ct both zone 1;fail
+ct original zone 1;ok
+ct reply zone 1;ok
+
+ct zone set 1;ok
+ct original zone set 1;ok
+ct reply zone set 1;ok
+ct zone set mark map { 1 : 1,  2 : 2 };ok;ct zone set mark map { 0x00000001 : 1, 0x00000002 : 2}
+ct both zone set 1;fail
+
 ct invalid;fail
 ct invalid original;fail
 ct set invalid original 42;fail
diff --git a/tests/py/any/ct.t.payload b/tests/py/any/ct.t.payload
index 3370bcac0594..e4c7f62b69f5 100644
--- a/tests/py/any/ct.t.payload
+++ b/tests/py/any/ct.t.payload
@@ -402,6 +402,50 @@ ip test-ip4 output
   [ immediate reg 1 0x00000000 0x00000000 0x00000000 0x80000000 ]
   [ ct set label with reg 1 ]
 
+# ct zone 0
+ip test-ip4 output
+  [ ct load zone => reg 1 ]
+  [ cmp eq reg 1 0x00000000 ]
+
+# ct zone 23
+ip test-ip4 output
+  [ ct load zone => reg 1 ]
+  [ cmp eq reg 1 0x00000017 ]
+
+# ct original zone 1
+ip test-ip4 output
+  [ ct load zone => reg 1 , dir original ]
+  [ cmp eq reg 1 0x00000001 ]
+
+# ct reply zone 1
+ip test-ip4 output
+  [ ct load zone => reg 1 , dir reply ]
+  [ cmp eq reg 1 0x00000001 ]
+
+# ct zone set 1
+ip test-ip4 output
+  [ immediate reg 1 0x00000001 ]
+  [ ct set zone with reg 1 ]
+
+# ct original zone set 1
+ip test-ip4 output
+  [ immediate reg 1 0x00000001 ]
+  [ ct set zone with reg 1 , dir original ]
+
+# ct reply zone set 1
+ip test-ip4 output
+  [ immediate reg 1 0x00000001 ]
+  [ ct set zone with reg 1 , dir reply ]
+
+# ct zone set mark map { 1 : 1,  2 : 2 }
+__map%d test-ip4 b
+__map%d test-ip4 0
+        element 00000001  : 00000001 0 [end]    element 00000002  : 00000002 0 [end]
+ip test-ip4 output
+  [ meta load mark => reg 1 ]
+  [ lookup reg 1 set __map%d dreg 1 ]
+  [ ct set zone with reg 1 ]
+
 # notrack
 ip test-ip4 output
   [ notrack ]
-- 
2.10.2


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH nftables 5/9] src: add host byte order integer type
  2017-02-03 12:35 ` [PATCH nftables 5/9] src: add host byte order integer type Florian Westphal
@ 2017-02-06 17:31   ` Pablo Neira Ayuso
  2017-02-06 18:17     ` Pablo Neira Ayuso
  2017-02-06 22:33     ` Florian Westphal
  0 siblings, 2 replies; 19+ messages in thread
From: Pablo Neira Ayuso @ 2017-02-06 17:31 UTC (permalink / raw)
  To: Florian Westphal; +Cc: netfilter-devel

On Fri, Feb 03, 2017 at 01:35:52PM +0100, Florian Westphal wrote:
> diff --git a/include/datatype.h b/include/datatype.h
> index 9f127f2954e3..8c1c827253be 100644
> --- a/include/datatype.h
> +++ b/include/datatype.h
> @@ -82,6 +82,7 @@ enum datatypes {
>  	TYPE_DSCP,
>  	TYPE_ECN,
>  	TYPE_FIB_ADDR,
> +	TYPE_U32,
>  	__TYPE_MAX
>  };
>  #define TYPE_MAX		(__TYPE_MAX - 1)

Right, this is a real problem with host byteorder integer, the
bytecode that we generate is not correct.

I have a patch to avoid this, it's still incomplete. I'm attaching it.

Note this is still incomplete, since this doesn't solve the netlink
delinearize path. We can use the NFT_SET_USERDATA area and the tlv
infrastructure that Carlos made in summer to store this
metainformation that is only useful to

This shouldn't be a showstopper to get kernel patches in, we have a
bit of time ahead to solve this userspace issue.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH nftables 5/9] src: add host byte order integer type
  2017-02-06 17:31   ` Pablo Neira Ayuso
@ 2017-02-06 18:17     ` Pablo Neira Ayuso
  2017-02-06 22:33     ` Florian Westphal
  1 sibling, 0 replies; 19+ messages in thread
From: Pablo Neira Ayuso @ 2017-02-06 18:17 UTC (permalink / raw)
  To: Florian Westphal; +Cc: netfilter-devel

[-- Attachment #1: Type: text/plain, Size: 1056 bytes --]

On Mon, Feb 06, 2017 at 06:31:10PM +0100, Pablo Neira Ayuso wrote:
> On Fri, Feb 03, 2017 at 01:35:52PM +0100, Florian Westphal wrote:
> > diff --git a/include/datatype.h b/include/datatype.h
> > index 9f127f2954e3..8c1c827253be 100644
> > --- a/include/datatype.h
> > +++ b/include/datatype.h
> > @@ -82,6 +82,7 @@ enum datatypes {
> >  	TYPE_DSCP,
> >  	TYPE_ECN,
> >  	TYPE_FIB_ADDR,
> > +	TYPE_U32,
> >  	__TYPE_MAX
> >  };
> >  #define TYPE_MAX		(__TYPE_MAX - 1)
> 
> Right, this is a real problem with host byteorder integer, the
> bytecode that we generate is not correct.
> 
> I have a patch to avoid this, it's still incomplete. I'm attaching it.
> 
> Note this is still incomplete, since this doesn't solve the netlink
> delinearize path. We can use the NFT_SET_USERDATA area and the tlv
> infrastructure that Carlos made in summer to store this
> metainformation that is only useful to
> 
> This shouldn't be a showstopper to get kernel patches in, we have a
> bit of time ahead to solve this userspace issue.

Forgot attachment, here it comes.

[-- Attachment #2: 0001-evaluate-store-byteorder-for-set-keys-patch --]
[-- Type: text/plain, Size: 4245 bytes --]

>From 52594fb4130560e2072524193296019d209e3bf8 Mon Sep 17 00:00:00 2001
From: Pablo Neira Ayuso <pablo@netfilter.org>
Date: Tue, 6 Sep 2016 19:49:05 +0200
Subject: [PATCH] evaluate: store byteorder for set keys

Selectors that rely on the integer type and expect host endian
byteorder don't work properly.

We need to keep the byteorder around based on the left hand size
expression that provides the context, so store the byteorder when
evaluating the map.

Before this patch.

 # nft --debug=netlink add rule x y meta mark set meta cpu map { 0 : 1, 1 : 2 }
 __map%d x b
 __map%d x 0
        element 00000000  : 00000001 0 [end]    element 01000000  : 00000002 0 [end]
                                                        ^^^^^^^^

This is expressed in network byteorder, because the invalid byteorder
defaults on this.

After this patch:

 # nft --debug=netlink add rule x y meta mark set meta cpu map { 0 : 1, 1 : 2 }
 __map%d x b
 __map%d x 0
        element 00000000  : 00000001 0 [end]    element 00000001  : 00000002 0 [end]
                                                        ^^^^^^^^

This is in host byteorder, as the key selector in the map mandates.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 include/rule.h |  3 +++
 src/evaluate.c | 23 +++++++++++++++--------
 2 files changed, 18 insertions(+), 8 deletions(-)

diff --git a/include/rule.h b/include/rule.h
index 99e92ee..ae9e462 100644
--- a/include/rule.h
+++ b/include/rule.h
@@ -209,6 +209,8 @@ enum set_flags {
 	SET_F_EVAL		= 0x20,
 };
 
+#include <datatype.h>
+
 /**
  * struct set - nftables set
  *
@@ -237,6 +239,7 @@ struct set {
 	uint64_t		timeout;
 	const struct datatype	*keytype;
 	unsigned int		keylen;
+	enum byteorder		keybyteorder;
 	const struct datatype	*datatype;
 	unsigned int		datalen;
 	struct expr		*init;
diff --git a/src/evaluate.c b/src/evaluate.c
index 45af329..b64dfc1 100644
--- a/src/evaluate.c
+++ b/src/evaluate.c
@@ -63,6 +63,7 @@ static struct expr *implicit_set_declaration(struct eval_ctx *ctx,
 					     const char *name,
 					     const struct datatype *keytype,
 					     unsigned int keylen,
+					     enum byteorder byteorder,
 					     struct expr *expr)
 {
 	struct cmd *cmd;
@@ -70,11 +71,12 @@ static struct expr *implicit_set_declaration(struct eval_ctx *ctx,
 	struct handle h;
 
 	set = set_alloc(&expr->location);
-	set->flags	= SET_F_ANONYMOUS | expr->set_flags;
-	set->handle.set = xstrdup(name),
-	set->keytype 	= keytype;
-	set->keylen	= keylen;
-	set->init	= expr;
+	set->flags	  = SET_F_ANONYMOUS | expr->set_flags;
+	set->handle.set   = xstrdup(name),
+	set->keytype 	  = keytype;
+	set->keylen	  = keylen;
+	set->keybyteorder = byteorder;
+	set->init	  = expr;
 
 	if (ctx->table != NULL)
 		list_add_tail(&set->list, &ctx->table->sets);
@@ -1104,7 +1106,9 @@ static int expr_evaluate_map(struct eval_ctx *ctx, struct expr **expr)
 	case EXPR_SET:
 		mappings = implicit_set_declaration(ctx, "__map%d",
 						    ctx->ectx.dtype,
-						    ctx->ectx.len, mappings);
+						    ctx->ectx.len,
+						    ctx->ectx.byteorder,
+						    mappings);
 		mappings->set->datatype = ectx.dtype;
 		mappings->set->datalen  = ectx.len;
 
@@ -1159,7 +1163,8 @@ static int expr_evaluate_mapping(struct eval_ctx *ctx, struct expr **expr)
 	if (!(set->flags & SET_F_MAP))
 		return set_error(ctx, set, "set is not a map");
 
-	expr_set_context(&ctx->ectx, set->keytype, set->keylen);
+	__expr_set_context(&ctx->ectx, set->keytype, set->keybyteorder,
+			   set->keylen, 0);
 	if (expr_evaluate(ctx, &mapping->left) < 0)
 		return -1;
 	if (!expr_is_constant(mapping->left))
@@ -1443,6 +1448,7 @@ static int expr_evaluate_relational(struct eval_ctx *ctx, struct expr **expr)
 			right = rel->right =
 				implicit_set_declaration(ctx, "__set%d",
 							 left->dtype,
+							 left->byteorder,
 							 left->len, right);
 			break;
 		case EXPR_SET_REF:
@@ -1818,7 +1824,8 @@ static int stmt_evaluate_flow(struct eval_ctx *ctx, struct stmt *stmt)
 		set->set_flags |= SET_F_TIMEOUT;
 
 	setref = implicit_set_declaration(ctx, stmt->flow.table ?: "__ft%d",
-					  key->dtype, key->len, set);
+					  key->dtype, key->dtype->byteorder,
+					  key->len, set);
 
 	stmt->flow.set = setref;
 
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH nftables 5/9] src: add host byte order integer type
  2017-02-06 17:31   ` Pablo Neira Ayuso
  2017-02-06 18:17     ` Pablo Neira Ayuso
@ 2017-02-06 22:33     ` Florian Westphal
  2017-02-07 11:58       ` Pablo Neira Ayuso
  1 sibling, 1 reply; 19+ messages in thread
From: Florian Westphal @ 2017-02-06 22:33 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: Florian Westphal, netfilter-devel

Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> On Fri, Feb 03, 2017 at 01:35:52PM +0100, Florian Westphal wrote:
> > diff --git a/include/datatype.h b/include/datatype.h
> > index 9f127f2954e3..8c1c827253be 100644
> > --- a/include/datatype.h
> > +++ b/include/datatype.h
> > @@ -82,6 +82,7 @@ enum datatypes {
> >  	TYPE_DSCP,
> >  	TYPE_ECN,
> >  	TYPE_FIB_ADDR,
> > +	TYPE_U32,
> >  	__TYPE_MAX
> >  };
> >  #define TYPE_MAX		(__TYPE_MAX - 1)
> 
> Right, this is a real problem with host byteorder integer, the
> bytecode that we generate is not correct.
> 
> I have a patch to avoid this, it's still incomplete. I'm attaching it.
> 
> Note this is still incomplete, since this doesn't solve the netlink
> delinearize path. We can use the NFT_SET_USERDATA area and the tlv
> infrastructure that Carlos made in summer to store this
> metainformation that is only useful to
> 
> This shouldn't be a showstopper to get kernel patches in, we have a
> bit of time ahead to solve this userspace issue.

I don't understand why all this fuss is required.

The type always enocodes/decides the endianess, so I fail to see why we
need to store endianess also in the templates (f.e. meta_templates[],
it just seems 100% redundant ...)

Thats why it I added this host endian thing here, we could then replace

[NFT_META_CPU]  = META_TEMPLATE("cpu", &integer_type, 4 * BITS_PER_BYTE, BYTEORDER_HOST_ENDIAN),
with
[NFT_META_CPU]  = META_TEMPLATE("cpu", &integer_u32, 4 * BITS_PER_BYTE),

and don't need this 'endianess override' in the templates anymore.
We might even be able to get rid of the endianess storage in the eval
context this way.

What am I missing?

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH nftables 5/9] src: add host byte order integer type
  2017-02-06 22:33     ` Florian Westphal
@ 2017-02-07 11:58       ` Pablo Neira Ayuso
  2017-02-07 12:29         ` Pablo Neira Ayuso
  0 siblings, 1 reply; 19+ messages in thread
From: Pablo Neira Ayuso @ 2017-02-07 11:58 UTC (permalink / raw)
  To: Florian Westphal; +Cc: netfilter-devel

On Mon, Feb 06, 2017 at 11:33:01PM +0100, Florian Westphal wrote:
> Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> > On Fri, Feb 03, 2017 at 01:35:52PM +0100, Florian Westphal wrote:
> > > diff --git a/include/datatype.h b/include/datatype.h
> > > index 9f127f2954e3..8c1c827253be 100644
> > > --- a/include/datatype.h
> > > +++ b/include/datatype.h
> > > @@ -82,6 +82,7 @@ enum datatypes {
> > >  	TYPE_DSCP,
> > >  	TYPE_ECN,
> > >  	TYPE_FIB_ADDR,
> > > +	TYPE_U32,
> > >  	__TYPE_MAX
> > >  };
> > >  #define TYPE_MAX		(__TYPE_MAX - 1)
> > 
> > Right, this is a real problem with host byteorder integer, the
> > bytecode that we generate is not correct.
> > 
> > I have a patch to avoid this, it's still incomplete. I'm attaching it.
> > 
> > Note this is still incomplete, since this doesn't solve the netlink
> > delinearize path. We can use the NFT_SET_USERDATA area and the tlv
> > infrastructure that Carlos made in summer to store this
> > metainformation that is only useful to
> > 
> > This shouldn't be a showstopper to get kernel patches in, we have a
> > bit of time ahead to solve this userspace issue.
> 
> I don't understand why all this fuss is required.
> 
> The type always enocodes/decides the endianess, so I fail to see why we
> need to store endianess also in the templates (f.e. meta_templates[],
> it just seems 100% redundant ...)
> 
> Thats why it I added this host endian thing here, we could then replace
> 
> [NFT_META_CPU]  = META_TEMPLATE("cpu", &integer_type, 4 * BITS_PER_BYTE, BYTEORDER_HOST_ENDIAN),
> with
> [NFT_META_CPU]  = META_TEMPLATE("cpu", &integer_u32, 4 * BITS_PER_BYTE),
>
> and don't need this 'endianess override' in the templates anymore.
> We might even be able to get rid of the endianess storage in the eval
> context this way.
> 
> What am I missing?

This approach will not work for more stuff coming ahead, eg.
concatenation support for integer datatypes. This currently doesn't
work since nft complains on concatenation with datatypes with not
fixed datatypes.

In that case, we will end up having integers of different sizes and
byteorder. We would need one hardcoded type for each variant.

Note that these TYPE_XYZ are ABI too, so we should avoid changes once
we push them into nftables.git. We also have a limited number of them,
6 bits since concatenations places up to 5 datatypes there using bit
shifts.

I understand the override is not nice, that's one of the reasons why I
did not push this yet. Probably we can allocate datatype instances
dynamically, so we don't need this new field.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH nftables 5/9] src: add host byte order integer type
  2017-02-07 11:58       ` Pablo Neira Ayuso
@ 2017-02-07 12:29         ` Pablo Neira Ayuso
  0 siblings, 0 replies; 19+ messages in thread
From: Pablo Neira Ayuso @ 2017-02-07 12:29 UTC (permalink / raw)
  To: Florian Westphal; +Cc: netfilter-devel

On Tue, Feb 07, 2017 at 12:58:56PM +0100, Pablo Neira Ayuso wrote:
> On Mon, Feb 06, 2017 at 11:33:01PM +0100, Florian Westphal wrote:
> > Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> > > On Fri, Feb 03, 2017 at 01:35:52PM +0100, Florian Westphal wrote:
> > > > diff --git a/include/datatype.h b/include/datatype.h
> > > > index 9f127f2954e3..8c1c827253be 100644
> > > > --- a/include/datatype.h
> > > > +++ b/include/datatype.h
> > > > @@ -82,6 +82,7 @@ enum datatypes {
> > > >  	TYPE_DSCP,
> > > >  	TYPE_ECN,
> > > >  	TYPE_FIB_ADDR,
> > > > +	TYPE_U32,
> > > >  	__TYPE_MAX
> > > >  };
> > > >  #define TYPE_MAX		(__TYPE_MAX - 1)
> > > 
> > > Right, this is a real problem with host byteorder integer, the
> > > bytecode that we generate is not correct.
> > > 
> > > I have a patch to avoid this, it's still incomplete. I'm attaching it.
> > > 
> > > Note this is still incomplete, since this doesn't solve the netlink
> > > delinearize path. We can use the NFT_SET_USERDATA area and the tlv
> > > infrastructure that Carlos made in summer to store this
> > > metainformation that is only useful to
> > > 
> > > This shouldn't be a showstopper to get kernel patches in, we have a
> > > bit of time ahead to solve this userspace issue.
> > 
> > I don't understand why all this fuss is required.
> > 
> > The type always enocodes/decides the endianess, so I fail to see why we
> > need to store endianess also in the templates (f.e. meta_templates[],
> > it just seems 100% redundant ...)
> > 
> > Thats why it I added this host endian thing here, we could then replace
> > 
> > [NFT_META_CPU]  = META_TEMPLATE("cpu", &integer_type, 4 * BITS_PER_BYTE, BYTEORDER_HOST_ENDIAN),
> > with
> > [NFT_META_CPU]  = META_TEMPLATE("cpu", &integer_u32, 4 * BITS_PER_BYTE),
> >
> > and don't need this 'endianess override' in the templates anymore.
> > We might even be able to get rid of the endianess storage in the eval
> > context this way.
> > 
> > What am I missing?
> 
> This approach will not work for more stuff coming ahead, eg.
> concatenation support for integer datatypes. This currently doesn't
> work since nft complains on concatenation with datatypes with not
> fixed datatypes.
> 
> In that case, we will end up having integers of different sizes and
> byteorder. We would need one hardcoded type for each variant.
> 
> Note that these TYPE_XYZ are ABI too, so we should avoid changes once
> we push them into nftables.git. We also have a limited number of them,
> 6 bits since concatenations places up to 5 datatypes there using bit
> shifts.
> 
> I understand the override is not nice, that's one of the reasons why I
> did not push this yet. Probably we can allocate datatype instances
> dynamically, so we don't need this new field.

Oh, to add more info: There is one more corner case we have to
support:

        add rule x y ip saddr . meta cpu eq 1.1.1.1 . 10

When using sets, we can stash the datatype, byteorder and size in
NFTA_SET_USERDATA. In this case above, we have to infer it from the
LHS of the relational that defines the concatenation. I guess this
will require a bit of code from rule_postprocess().

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH nf-next 1/9] netfilter: nft_ct: add zone id get support
  2017-02-03 12:35 ` [PATCH nf-next 1/9] netfilter: nft_ct: add zone id get support Florian Westphal
@ 2017-02-08  9:28   ` Pablo Neira Ayuso
  0 siblings, 0 replies; 19+ messages in thread
From: Pablo Neira Ayuso @ 2017-02-08  9:28 UTC (permalink / raw)
  To: Florian Westphal; +Cc: netfilter-devel

On Fri, Feb 03, 2017 at 01:35:48PM +0100, Florian Westphal wrote:
> Just like with counters the direction attribute is optional.
> We set priv->dir to MAX unconditionally to avoid duplicating the assignment
> for all keys with optional direction.
> 
> For keys where direction is mandatory, existing code already returns
> an error.

Applied, thanks Florian.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH nf-next 2/9] netfilter: nft_ct: prepare for key-dependent error unwind
  2017-02-03 12:35 ` [PATCH nf-next 2/9] netfilter: nft_ct: prepare for key-dependent error unwind Florian Westphal
@ 2017-02-08  9:29   ` Pablo Neira Ayuso
  0 siblings, 0 replies; 19+ messages in thread
From: Pablo Neira Ayuso @ 2017-02-08  9:29 UTC (permalink / raw)
  To: Florian Westphal; +Cc: netfilter-devel

On Fri, Feb 03, 2017 at 01:35:49PM +0100, Florian Westphal wrote:
> Next patch will add ZONE_ID set support which will need similar
> error unwind (put operation) as conntrack labels.
> 
> Prepare for this: remove the 'label_got' boolean in favor
> of a switch statement that can be extended in next patch.
> 
> As we already have that in the set_destroy function place that in
> a separate function and call it from the set init function.

Also applied, thanks.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH nf-next 3/9] netfilter: nft_ct: add zone id set support
  2017-02-03 12:35 ` [PATCH nf-next 3/9] netfilter: nft_ct: add zone id set support Florian Westphal
@ 2017-02-08  9:29   ` Pablo Neira Ayuso
  0 siblings, 0 replies; 19+ messages in thread
From: Pablo Neira Ayuso @ 2017-02-08  9:29 UTC (permalink / raw)
  To: Florian Westphal; +Cc: netfilter-devel

On Fri, Feb 03, 2017 at 01:35:50PM +0100, Florian Westphal wrote:
> zones allow tracking multiple connections sharing identical tuples,
> this is needed e.g. when tracking distinct vlans with overlapping ip
> addresses (conntrack is l2 agnostic).
> 
> Thus the zone has to be set before the packet is picked up by the
> connection tracker.  This is done by means of 'conntrack templates' which
> are conntrack structures used solely to pass this info from one netfilter
> hook to the next.
> 
> The iptables CT target instantiates these connection tracking templates
> once per rule, i.e. the template is fixed/tied to particular zone, can
> be read-only and therefore be re-used by as many skbs simultaneously as
> needed.
> 
> We can't follow this model because we want to take the zone id from
> an sreg at rule eval time so we could e.g. fill in the zone id from
> the packets vlan id or a e.g. nftables key : value maps.
> 
> To avoid cost of per packet alloc/free of the template, use a percpu
> template 'scratch' object and use the refcount to detect the (unlikely)
> case where the template is still attached to another skb (i.e., previous
> skb was nfqueued ...).

Applied, thanks!

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH libnftnl 4/9] src: ct: add zone support
  2017-02-03 12:35 ` [PATCH libnftnl 4/9] src: ct: add zone support Florian Westphal
@ 2017-02-19 19:22   ` Pablo Neira Ayuso
  0 siblings, 0 replies; 19+ messages in thread
From: Pablo Neira Ayuso @ 2017-02-19 19:22 UTC (permalink / raw)
  To: Florian Westphal; +Cc: netfilter-devel

On Fri, Feb 03, 2017 at 01:35:51PM +0100, Florian Westphal wrote:
> Signed-off-by: Florian Westphal <fw@strlen.de>

Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>

Please, push this chunk for libnftnl into git.netfilter.org, Thanks!

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2017-02-19 19:22 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-02-03 12:35 [PATCH -next 0/9] nftables: add zone support to ct statement Florian Westphal
2017-02-03 12:35 ` [PATCH nf-next 1/9] netfilter: nft_ct: add zone id get support Florian Westphal
2017-02-08  9:28   ` Pablo Neira Ayuso
2017-02-03 12:35 ` [PATCH nf-next 2/9] netfilter: nft_ct: prepare for key-dependent error unwind Florian Westphal
2017-02-08  9:29   ` Pablo Neira Ayuso
2017-02-03 12:35 ` [PATCH nf-next 3/9] netfilter: nft_ct: add zone id set support Florian Westphal
2017-02-08  9:29   ` Pablo Neira Ayuso
2017-02-03 12:35 ` [PATCH libnftnl 4/9] src: ct: add zone support Florian Westphal
2017-02-19 19:22   ` Pablo Neira Ayuso
2017-02-03 12:35 ` [PATCH nftables 5/9] src: add host byte order integer type Florian Westphal
2017-02-06 17:31   ` Pablo Neira Ayuso
2017-02-06 18:17     ` Pablo Neira Ayuso
2017-02-06 22:33     ` Florian Westphal
2017-02-07 11:58       ` Pablo Neira Ayuso
2017-02-07 12:29         ` Pablo Neira Ayuso
2017-02-03 12:35 ` [PATCH nftables 6/9] src: add conntrack zone support Florian Westphal
2017-02-03 12:35 ` [PATCH nftables 7/9] ct: refactor print function so it can be re-used for ct statement Florian Westphal
2017-02-03 12:35 ` [PATCH nftables 8/9] src: support zone set statement with optional direction Florian Westphal
2017-02-03 12:35 ` [PATCH nftables 9/9] tests: add test entries for conntrack zones Florian Westphal

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).