netfilter-devel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH] netfilter: nf_tables: extend payload to support writing data
@ 2014-02-17 18:12 Nikolay Aleksandrov
  2014-02-17 18:37 ` Patrick McHardy
  0 siblings, 1 reply; 10+ messages in thread
From: Nikolay Aleksandrov @ 2014-02-17 18:12 UTC (permalink / raw)
  To: netfilter-devel; +Cc: pablo, kaber, Nikolay Aleksandrov

This patch extends the payload expression to support packet writing.
The new payload attribute - SREG specifies the source register to use
when changing packet data, the rest of the attributes are the same:
base - where to start from
offset - offset in the packet
len - length to write

The DREG attribute should not be set if writing is intended, if both
attributes are set the SREG (packet writing) will take precedence.
When the expression is dumped both registers will get dumped and the
user can differentiate the type of the payload (reading/writing) by
checking if the sreg attribute is different from zero.

Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
---
I'm sending this patch early thus the RFC (I'm still testing),
just to see if you have any comments on the structure and changes. Now to
justify a few of the changes:
I added the sreg as a separate variable to struct nft_payload in order for
the user to be able to recognize which payload type is the expression when
dumping since both SREG and DREG get dumped.
I also factored the offset code in nft_payload_make_offset since it's used
by both evaluation functions, returns -1 on error.
The two init functions check only the registers as that's what is different
the rest of the attributes are still checked by the select_ops. I've also
allowed to have both SREG and DREG set but in such case SREG takes
precedence.
I left the old payload names as they were, but they can get _get or
something else if you'd like.
One question though, I saw that you've applied my previous power of 2 patch
but I didn't see it in net/net-next, is there another tree that I should be
writing the patches against ?
This patch applies to Dave's net-next tree.

 include/net/netfilter/nf_tables_core.h   |   1 +
 include/uapi/linux/netfilter/nf_tables.h |   2 +
 net/netfilter/nft_payload.c              | 101 +++++++++++++++++++++++++------
 3 files changed, 84 insertions(+), 20 deletions(-)

diff --git a/include/net/netfilter/nf_tables_core.h b/include/net/netfilter/nf_tables_core.h
index cf2b7ae..0a0bd9a 100644
--- a/include/net/netfilter/nf_tables_core.h
+++ b/include/net/netfilter/nf_tables_core.h
@@ -32,6 +32,7 @@ struct nft_payload {
 	u8			offset;
 	u8			len;
 	enum nft_registers	dreg:8;
+	enum nft_registers	sreg:8;
 };
 
 extern const struct nft_expr_ops nft_payload_fast_ops;
diff --git a/include/uapi/linux/netfilter/nf_tables.h b/include/uapi/linux/netfilter/nf_tables.h
index 83c985a..7730a36 100644
--- a/include/uapi/linux/netfilter/nf_tables.h
+++ b/include/uapi/linux/netfilter/nf_tables.h
@@ -480,6 +480,7 @@ enum nft_payload_bases {
 /**
  * enum nft_payload_attributes - nf_tables payload expression netlink attributes
  *
+ * @NFTA_PAYLOAD_SREG: source register to load data from (NLA_U32: nft_registers)
  * @NFTA_PAYLOAD_DREG: destination register to load data into (NLA_U32: nft_registers)
  * @NFTA_PAYLOAD_BASE: payload base (NLA_U32: nft_payload_bases)
  * @NFTA_PAYLOAD_OFFSET: payload offset relative to base (NLA_U32)
@@ -491,6 +492,7 @@ enum nft_payload_attributes {
 	NFTA_PAYLOAD_BASE,
 	NFTA_PAYLOAD_OFFSET,
 	NFTA_PAYLOAD_LEN,
+	NFTA_PAYLOAD_SREG,
 	__NFTA_PAYLOAD_MAX
 };
 #define NFTA_PAYLOAD_MAX	(__NFTA_PAYLOAD_MAX - 1)
diff --git a/net/netfilter/nft_payload.c b/net/netfilter/nft_payload.c
index a2aeb31..1b42668 100644
--- a/net/netfilter/nft_payload.c
+++ b/net/netfilter/nft_payload.c
@@ -17,23 +17,19 @@
 #include <net/netfilter/nf_tables_core.h>
 #include <net/netfilter/nf_tables.h>
 
-static void nft_payload_eval(const struct nft_expr *expr,
-			     struct nft_data data[NFT_REG_MAX + 1],
-			     const struct nft_pktinfo *pkt)
+static int nft_payload_make_offset(const struct nft_pktinfo *pkt,
+				   const struct nft_payload *payload)
 {
-	const struct nft_payload *priv = nft_expr_priv(expr);
-	const struct sk_buff *skb = pkt->skb;
-	struct nft_data *dest = &data[priv->dreg];
-	int offset;
+	int offset = -1;
 
-	switch (priv->base) {
+	switch (payload->base) {
 	case NFT_PAYLOAD_LL_HEADER:
-		if (!skb_mac_header_was_set(skb))
-			goto err;
-		offset = skb_mac_header(skb) - skb->data;
+		if (!skb_mac_header_was_set(pkt->skb))
+			return offset;
+		offset = skb_mac_header(pkt->skb) - pkt->skb->data;
 		break;
 	case NFT_PAYLOAD_NETWORK_HEADER:
-		offset = skb_network_offset(skb);
+		offset = skb_network_offset(pkt->skb);
 		break;
 	case NFT_PAYLOAD_TRANSPORT_HEADER:
 		offset = pkt->xt.thoff;
@@ -41,16 +37,49 @@ static void nft_payload_eval(const struct nft_expr *expr,
 	default:
 		BUG();
 	}
-	offset += priv->offset;
+	offset += payload->offset;
+
+	return offset;
+}
+
+static void nft_payload_eval(const struct nft_expr *expr,
+			     struct nft_data data[NFT_REG_MAX + 1],
+			     const struct nft_pktinfo *pkt)
+{
+	const struct nft_payload *priv = nft_expr_priv(expr);
+	struct nft_data *dest = &data[priv->dreg];
+	int offset = nft_payload_make_offset(pkt, priv);
 
-	if (skb_copy_bits(skb, offset, dest->data, priv->len) < 0)
+	if (offset == -1)
+		goto err;
+	if (skb_copy_bits(pkt->skb, offset, dest->data, priv->len) < 0)
 		goto err;
 	return;
 err:
 	data[NFT_REG_VERDICT].verdict = NFT_BREAK;
 }
 
+static void nft_payload_set_eval(const struct nft_expr *expr,
+				 struct nft_data data[NFT_REG_MAX + 1],
+				 const struct nft_pktinfo *pkt)
+{
+	const struct nft_payload *priv = nft_expr_priv(expr);
+	struct nft_data *source = &data[priv->sreg];
+	int offset = nft_payload_make_offset(pkt, priv);
+
+	if (offset == -1)
+		goto err;
+	if (!skb_make_writable(pkt->skb, offset + priv->len))
+		goto err;
+	memcpy(pkt->skb->data + offset, source, priv->len);
+
+	return;
+err:
+	data[NFT_REG_VERDICT].verdict = NFT_BREAK;
+}
+
 static const struct nla_policy nft_payload_policy[NFTA_PAYLOAD_MAX + 1] = {
+	[NFTA_PAYLOAD_SREG]	= { .type = NLA_U32 },
 	[NFTA_PAYLOAD_DREG]	= { .type = NLA_U32 },
 	[NFTA_PAYLOAD_BASE]	= { .type = NLA_U32 },
 	[NFTA_PAYLOAD_OFFSET]	= { .type = NLA_U32 },
@@ -75,11 +104,29 @@ static int nft_payload_init(const struct nft_ctx *ctx,
 	return nft_validate_data_load(ctx, priv->dreg, NULL, NFT_DATA_VALUE);
 }
 
+static int nft_payload_set_init(const struct nft_ctx *ctx,
+				const struct nft_expr *expr,
+				const struct nlattr * const tb[])
+{
+	struct nft_payload *priv = nft_expr_priv(expr);
+	int err;
+
+	priv->base   = ntohl(nla_get_be32(tb[NFTA_PAYLOAD_BASE]));
+	priv->offset = ntohl(nla_get_be32(tb[NFTA_PAYLOAD_OFFSET]));
+	priv->len    = ntohl(nla_get_be32(tb[NFTA_PAYLOAD_LEN]));
+
+	priv->sreg = ntohl(nla_get_be32(tb[NFTA_PAYLOAD_SREG]));
+	err = nft_validate_input_register(priv->sreg);
+
+	return err;
+}
+
 static int nft_payload_dump(struct sk_buff *skb, const struct nft_expr *expr)
 {
 	const struct nft_payload *priv = nft_expr_priv(expr);
 
-	if (nla_put_be32(skb, NFTA_PAYLOAD_DREG, htonl(priv->dreg)) ||
+	if (nla_put_be32(skb, NFTA_PAYLOAD_SREG, htonl(priv->sreg)) ||
+	    nla_put_be32(skb, NFTA_PAYLOAD_DREG, htonl(priv->dreg)) ||
 	    nla_put_be32(skb, NFTA_PAYLOAD_BASE, htonl(priv->base)) ||
 	    nla_put_be32(skb, NFTA_PAYLOAD_OFFSET, htonl(priv->offset)) ||
 	    nla_put_be32(skb, NFTA_PAYLOAD_LEN, htonl(priv->len)))
@@ -107,6 +154,14 @@ const struct nft_expr_ops nft_payload_fast_ops = {
 	.dump		= nft_payload_dump,
 };
 
+static const struct nft_expr_ops nft_payload_set_ops = {
+	.type		= &nft_payload_type,
+	.size		= NFT_EXPR_SIZE(sizeof(struct nft_payload)),
+	.eval		= nft_payload_set_eval,
+	.init		= nft_payload_set_init,
+	.dump		= nft_payload_dump,
+};
+
 static const struct nft_expr_ops *
 nft_payload_select_ops(const struct nft_ctx *ctx,
 		       const struct nlattr * const tb[])
@@ -114,7 +169,8 @@ nft_payload_select_ops(const struct nft_ctx *ctx,
 	enum nft_payload_bases base;
 	unsigned int offset, len;
 
-	if (tb[NFTA_PAYLOAD_DREG] == NULL ||
+	if ((tb[NFTA_PAYLOAD_SREG] == NULL &&
+	    tb[NFTA_PAYLOAD_DREG] == NULL) ||
 	    tb[NFTA_PAYLOAD_BASE] == NULL ||
 	    tb[NFTA_PAYLOAD_OFFSET] == NULL ||
 	    tb[NFTA_PAYLOAD_LEN] == NULL)
@@ -135,10 +191,15 @@ nft_payload_select_ops(const struct nft_ctx *ctx,
 	if (len == 0 || len > FIELD_SIZEOF(struct nft_data, data))
 		return ERR_PTR(-EINVAL);
 
-	if (len <= 4 && IS_ALIGNED(offset, len) && base != NFT_PAYLOAD_LL_HEADER)
-		return &nft_payload_fast_ops;
-	else
-		return &nft_payload_ops;
+	if (tb[NFTA_PAYLOAD_SREG]) {
+		return &nft_payload_set_ops;
+	} else {
+		if (len <= 4 && IS_ALIGNED(offset, len) &&
+		    base != NFT_PAYLOAD_LL_HEADER)
+			return &nft_payload_fast_ops;
+		else
+			return &nft_payload_ops;
+	}
 }
 
 static struct nft_expr_type nft_payload_type __read_mostly = {
-- 
1.8.4.2


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH] netfilter: nf_tables: extend payload to support writing data
  2014-02-17 18:12 [RFC PATCH] netfilter: nf_tables: extend payload to support writing data Nikolay Aleksandrov
@ 2014-02-17 18:37 ` Patrick McHardy
  2014-02-17 18:43   ` Nikolay Aleksandrov
  0 siblings, 1 reply; 10+ messages in thread
From: Patrick McHardy @ 2014-02-17 18:37 UTC (permalink / raw)
  To: Nikolay Aleksandrov; +Cc: netfilter-devel, pablo

On Mon, Feb 17, 2014 at 07:12:17PM +0100, Nikolay Aleksandrov wrote:
> This patch extends the payload expression to support packet writing.
> The new payload attribute - SREG specifies the source register to use
> when changing packet data, the rest of the attributes are the same:
> base - where to start from
> offset - offset in the packet
> len - length to write
> 
> The DREG attribute should not be set if writing is intended, if both
> attributes are set the SREG (packet writing) will take precedence.
> When the expression is dumped both registers will get dumped and the
> user can differentiate the type of the payload (reading/writing) by
> checking if the sreg attribute is different from zero.

I'd suggest to return an error if both are set. We should only accept
clearly defined input and reject everything else.

> I'm sending this patch early thus the RFC (I'm still testing),
> just to see if you have any comments on the structure and changes. Now to
> justify a few of the changes:
> I added the sreg as a separate variable to struct nft_payload in order for
> the user to be able to recognize which payload type is the expression when
> dumping since both SREG and DREG get dumped.

Adding another register seems fine. I'd suggest to only dump the one
actually used though.

> I also factored the offset code in nft_payload_make_offset since it's used
> by both evaluation functions, returns -1 on error.
> The two init functions check only the registers as that's what is different
> the rest of the attributes are still checked by the select_ops. I've also
> allowed to have both SREG and DREG set but in such case SREG takes
> precedence.

See above.

> I left the old payload names as they were, but they can get _get or
> something else if you'd like.

Actually I dislike the _get and _set suffixes, so I'm happy to use them
as little as possible :)

> diff --git a/net/netfilter/nft_payload.c b/net/netfilter/nft_payload.c
> index a2aeb31..1b42668 100644
> --- a/net/netfilter/nft_payload.c
> +++ b/net/netfilter/nft_payload.c
> @@ -17,23 +17,19 @@
>  #include <net/netfilter/nf_tables_core.h>
>  #include <net/netfilter/nf_tables.h>
>  
> -static void nft_payload_eval(const struct nft_expr *expr,
> -			     struct nft_data data[NFT_REG_MAX + 1],
> -			     const struct nft_pktinfo *pkt)
> +static int nft_payload_make_offset(const struct nft_pktinfo *pkt,
> +				   const struct nft_payload *payload)

We're using priv everywhere. Please keep this consistent. Also please
make sure (in the entire patch) that arguments are aligned with the
opening (.

>  {
> -	const struct nft_payload *priv = nft_expr_priv(expr);
> -	const struct sk_buff *skb = pkt->skb;
> -	struct nft_data *dest = &data[priv->dreg];
> -	int offset;
> +	int offset = -1;
>  
> -	switch (priv->base) {
> +	switch (payload->base) {
>  	case NFT_PAYLOAD_LL_HEADER:
> -		if (!skb_mac_header_was_set(skb))
> -			goto err;
> -		offset = skb_mac_header(skb) - skb->data;
> +		if (!skb_mac_header_was_set(pkt->skb))

Why don't you keep the local skb pointer?

> +			return offset;
> +		offset = skb_mac_header(pkt->skb) - pkt->skb->data;
>  		break;
>  	case NFT_PAYLOAD_NETWORK_HEADER:
> -		offset = skb_network_offset(skb);
> +		offset = skb_network_offset(pkt->skb);
>  		break;
>  	case NFT_PAYLOAD_TRANSPORT_HEADER:
>  		offset = pkt->xt.thoff;

> +static void nft_payload_set_eval(const struct nft_expr *expr,
> +				 struct nft_data data[NFT_REG_MAX + 1],
> +				 const struct nft_pktinfo *pkt)
> +{
> +	const struct nft_payload *priv = nft_expr_priv(expr);
> +	struct nft_data *source = &data[priv->sreg];
> +	int offset = nft_payload_make_offset(pkt, priv);
> +
> +	if (offset == -1)
> +		goto err;
> +	if (!skb_make_writable(pkt->skb, offset + priv->len))
> +		goto err;
> +	memcpy(pkt->skb->data + offset, source, priv->len);

We need to take care of checksumming. Shouldn't be too hard to get *most*
cases right using the existing checksum helpers.

> +
> +	return;
> +err:
> +	data[NFT_REG_VERDICT].verdict = NFT_BREAK;
> +}
> +

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH] netfilter: nf_tables: extend payload to support writing data
  2014-02-17 18:37 ` Patrick McHardy
@ 2014-02-17 18:43   ` Nikolay Aleksandrov
  2014-02-17 18:46     ` Patrick McHardy
  0 siblings, 1 reply; 10+ messages in thread
From: Nikolay Aleksandrov @ 2014-02-17 18:43 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: netfilter-devel, pablo

On 02/17/2014 07:37 PM, Patrick McHardy wrote:
> On Mon, Feb 17, 2014 at 07:12:17PM +0100, Nikolay Aleksandrov wrote:
>> This patch extends the payload expression to support packet writing.
>> The new payload attribute - SREG specifies the source register to use
>> when changing packet data, the rest of the attributes are the same:
>> base - where to start from
>> offset - offset in the packet
>> len - length to write
>>
>> The DREG attribute should not be set if writing is intended, if both
>> attributes are set the SREG (packet writing) will take precedence.
>> When the expression is dumped both registers will get dumped and the
>> user can differentiate the type of the payload (reading/writing) by
>> checking if the sreg attribute is different from zero.
> 
> I'd suggest to return an error if both are set. We should only accept
> clearly defined input and reject everything else.
> 
Okay, will do.

>> I'm sending this patch early thus the RFC (I'm still testing),
>> just to see if you have any comments on the structure and changes. Now to
>> justify a few of the changes:
>> I added the sreg as a separate variable to struct nft_payload in order for
>> the user to be able to recognize which payload type is the expression when
>> dumping since both SREG and DREG get dumped.
> 
> Adding another register seems fine. I'd suggest to only dump the one
> actually used though.
> 
Given the previous comment, yep :)

>> I also factored the offset code in nft_payload_make_offset since it's used
>> by both evaluation functions, returns -1 on error.
>> The two init functions check only the registers as that's what is different
>> the rest of the attributes are still checked by the select_ops. I've also
>> allowed to have both SREG and DREG set but in such case SREG takes
>> precedence.
> 
> See above.
> 
>> I left the old payload names as they were, but they can get _get or
>> something else if you'd like.
> 
> Actually I dislike the _get and _set suffixes, so I'm happy to use them
> as little as possible :)
> 
>> diff --git a/net/netfilter/nft_payload.c b/net/netfilter/nft_payload.c
>> index a2aeb31..1b42668 100644
>> --- a/net/netfilter/nft_payload.c
>> +++ b/net/netfilter/nft_payload.c
>> @@ -17,23 +17,19 @@
>>  #include <net/netfilter/nf_tables_core.h>
>>  #include <net/netfilter/nf_tables.h>
>>  
>> -static void nft_payload_eval(const struct nft_expr *expr,
>> -			     struct nft_data data[NFT_REG_MAX + 1],
>> -			     const struct nft_pktinfo *pkt)
>> +static int nft_payload_make_offset(const struct nft_pktinfo *pkt,
>> +				   const struct nft_payload *payload)
> 
> We're using priv everywhere. Please keep this consistent. Also please
> make sure (in the entire patch) that arguments are aligned with the
> opening (.
> 
Okay, I'll use the priv.
They're adjusted, apply the patch and see for yourself, or do you mean some
other adjustment ?
Both arguments are aligned from what I can see.

>>  {
>> -	const struct nft_payload *priv = nft_expr_priv(expr);
>> -	const struct sk_buff *skb = pkt->skb;
>> -	struct nft_data *dest = &data[priv->dreg];
>> -	int offset;
>> +	int offset = -1;
>>  
>> -	switch (priv->base) {
>> +	switch (payload->base) {
>>  	case NFT_PAYLOAD_LL_HEADER:
>> -		if (!skb_mac_header_was_set(skb))
>> -			goto err;
>> -		offset = skb_mac_header(skb) - skb->data;
>> +		if (!skb_mac_header_was_set(pkt->skb))
> 
> Why don't you keep the local skb pointer?
> 
Anyway is fine by me, I'll leave it.

>> +			return offset;
>> +		offset = skb_mac_header(pkt->skb) - pkt->skb->data;
>>  		break;
>>  	case NFT_PAYLOAD_NETWORK_HEADER:
>> -		offset = skb_network_offset(skb);
>> +		offset = skb_network_offset(pkt->skb);
>>  		break;
>>  	case NFT_PAYLOAD_TRANSPORT_HEADER:
>>  		offset = pkt->xt.thoff;
> 
>> +static void nft_payload_set_eval(const struct nft_expr *expr,
>> +				 struct nft_data data[NFT_REG_MAX + 1],
>> +				 const struct nft_pktinfo *pkt)
>> +{
>> +	const struct nft_payload *priv = nft_expr_priv(expr);
>> +	struct nft_data *source = &data[priv->sreg];
>> +	int offset = nft_payload_make_offset(pkt, priv);
>> +
>> +	if (offset == -1)
>> +		goto err;
>> +	if (!skb_make_writable(pkt->skb, offset + priv->len))
>> +		goto err;
>> +	memcpy(pkt->skb->data + offset, source, priv->len);
> 
> We need to take care of checksumming. Shouldn't be too hard to get *most*
> cases right using the existing checksum helpers.
> 
Yes, but would you like to do that in here or have a separate op which
is configurable i.e. you can set what to calculate and where to put it,
or would you prefer to make it automatic based on what we're changing here ?

>> +
>> +	return;
>> +err:
>> +	data[NFT_REG_VERDICT].verdict = NFT_BREAK;
>> +}
>> +
> --
> To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH] netfilter: nf_tables: extend payload to support writing data
  2014-02-17 18:43   ` Nikolay Aleksandrov
@ 2014-02-17 18:46     ` Patrick McHardy
  2014-02-19 16:12       ` Nikolay Aleksandrov
  0 siblings, 1 reply; 10+ messages in thread
From: Patrick McHardy @ 2014-02-17 18:46 UTC (permalink / raw)
  To: Nikolay Aleksandrov; +Cc: netfilter-devel, pablo

On Mon, Feb 17, 2014 at 07:43:38PM +0100, Nikolay Aleksandrov wrote:
> On 02/17/2014 07:37 PM, Patrick McHardy wrote:
> >>  
> >> -static void nft_payload_eval(const struct nft_expr *expr,
> >> -			     struct nft_data data[NFT_REG_MAX + 1],
> >> -			     const struct nft_pktinfo *pkt)
> >> +static int nft_payload_make_offset(const struct nft_pktinfo *pkt,
> >> +				   const struct nft_payload *payload)
> > 
> > We're using priv everywhere. Please keep this consistent. Also please
> > make sure (in the entire patch) that arguments are aligned with the
> > opening (.
> > 
> Okay, I'll use the priv.
> They're adjusted, apply the patch and see for yourself, or do you mean some
> other adjustment ?
> Both arguments are aligned from what I can see.

I see. Probably misrepresented by mutt.

> >> +static void nft_payload_set_eval(const struct nft_expr *expr,
> >> +				 struct nft_data data[NFT_REG_MAX + 1],
> >> +				 const struct nft_pktinfo *pkt)
> >> +{
> >> +	const struct nft_payload *priv = nft_expr_priv(expr);
> >> +	struct nft_data *source = &data[priv->sreg];
> >> +	int offset = nft_payload_make_offset(pkt, priv);
> >> +
> >> +	if (offset == -1)
> >> +		goto err;
> >> +	if (!skb_make_writable(pkt->skb, offset + priv->len))
> >> +		goto err;
> >> +	memcpy(pkt->skb->data + offset, source, priv->len);
> > 
> > We need to take care of checksumming. Shouldn't be too hard to get *most*
> > cases right using the existing checksum helpers.
> > 
> Yes, but would you like to do that in here or have a separate op which
> is configurable i.e. you can set what to calculate and where to put it,
> or would you prefer to make it automatic based on what we're changing here ?

Something here is a lot cheaper since we can use incremental checksumming.
That won't be possible somewhere else, additionally we'd have to check
the checksum before recalculating it to make sure we don't fix up bad
packets. So yes, I think it should be done here.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH] netfilter: nf_tables: extend payload to support writing data
  2014-02-17 18:46     ` Patrick McHardy
@ 2014-02-19 16:12       ` Nikolay Aleksandrov
  2014-02-19 16:25         ` Patrick McHardy
  0 siblings, 1 reply; 10+ messages in thread
From: Nikolay Aleksandrov @ 2014-02-19 16:12 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: netfilter-devel, pablo

On 02/17/2014 07:46 PM, Patrick McHardy wrote:
> On Mon, Feb 17, 2014 at 07:43:38PM +0100, Nikolay Aleksandrov wrote:
>> On 02/17/2014 07:37 PM, Patrick McHardy wrote:
>>>>  
>>>> -static void nft_payload_eval(const struct nft_expr *expr,
>>>> -			     struct nft_data data[NFT_REG_MAX + 1],
>>>> -			     const struct nft_pktinfo *pkt)
>>>> +static int nft_payload_make_offset(const struct nft_pktinfo *pkt,
>>>> +				   const struct nft_payload *payload)
>>>
>>> We're using priv everywhere. Please keep this consistent. Also please
>>> make sure (in the entire patch) that arguments are aligned with the
>>> opening (.
>>>
>> Okay, I'll use the priv.
>> They're adjusted, apply the patch and see for yourself, or do you mean some
>> other adjustment ?
>> Both arguments are aligned from what I can see.
> 
> I see. Probably misrepresented by mutt.
> 
>>>> +static void nft_payload_set_eval(const struct nft_expr *expr,
>>>> +				 struct nft_data data[NFT_REG_MAX + 1],
>>>> +				 const struct nft_pktinfo *pkt)
>>>> +{
>>>> +	const struct nft_payload *priv = nft_expr_priv(expr);
>>>> +	struct nft_data *source = &data[priv->sreg];
>>>> +	int offset = nft_payload_make_offset(pkt, priv);
>>>> +
>>>> +	if (offset == -1)
>>>> +		goto err;
>>>> +	if (!skb_make_writable(pkt->skb, offset + priv->len))
>>>> +		goto err;
>>>> +	memcpy(pkt->skb->data + offset, source, priv->len);
>>>
>>> We need to take care of checksumming. Shouldn't be too hard to get *most*
>>> cases right using the existing checksum helpers.
>>>
>> Yes, but would you like to do that in here or have a separate op which
>> is configurable i.e. you can set what to calculate and where to put it,
>> or would you prefer to make it automatic based on what we're changing here ?
> 
> Something here is a lot cheaper since we can use incremental checksumming.
> That won't be possible somewhere else, additionally we'd have to check
> the checksum before recalculating it to make sure we don't fix up bad
> packets. So yes, I think it should be done here.
> 
Okay, I've been working on this today and have gotten to a point where it works :)
But I have a few questions, currently I've made it so the header that you modify
is the only one that gets its checksum updated (with the exception of pseudo
header if the address gets written to), I recompute the changed bytes one at a
time because of the freedom of offset and length. Now this works fine and I can
mangle the ip/ip6 headers or the transport headers (tcp/udp, I'll look into
adding sctp as well), but there's a problem with the cross-header writing i.e.
if a write spills from the network header to the transport header, currently I
only update the network header part (correctly, only up to the bytes that were
changed inside) and leave the transport header broken.
This also applies to the LL header, if a write spills into the network header
then we'll have a broken packet.

Would you like me to add additional logic to support the cross-header writes or
leave it as it is ?

Nik

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH] netfilter: nf_tables: extend payload to support writing data
  2014-02-19 16:25         ` Patrick McHardy
@ 2014-02-19 16:22           ` Nikolay Aleksandrov
  0 siblings, 0 replies; 10+ messages in thread
From: Nikolay Aleksandrov @ 2014-02-19 16:22 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: netfilter-devel, pablo

On 02/19/2014 05:25 PM, Patrick McHardy wrote:
> On Wed, Feb 19, 2014 at 05:12:13PM +0100, Nikolay Aleksandrov wrote:
>> On 02/17/2014 07:46 PM, Patrick McHardy wrote:
>>>>>
>>>>> We need to take care of checksumming. Shouldn't be too hard to get *most*
>>>>> cases right using the existing checksum helpers.
>>>>>
>>>> Yes, but would you like to do that in here or have a separate op which
>>>> is configurable i.e. you can set what to calculate and where to put it,
>>>> or would you prefer to make it automatic based on what we're changing here ?
>>>
>>> Something here is a lot cheaper since we can use incremental checksumming.
>>> That won't be possible somewhere else, additionally we'd have to check
>>> the checksum before recalculating it to make sure we don't fix up bad
>>> packets. So yes, I think it should be done here.
>>>
>> Okay, I've been working on this today and have gotten to a point where it works :)
>> But I have a few questions, currently I've made it so the header that you modify
>> is the only one that gets its checksum updated (with the exception of pseudo
>> header if the address gets written to), I recompute the changed bytes one at a
>> time because of the freedom of offset and length. Now this works fine and I can
>> mangle the ip/ip6 headers or the transport headers (tcp/udp, I'll look into
>> adding sctp as well), but there's a problem with the cross-header writing i.e.
>> if a write spills from the network header to the transport header, currently I
>> only update the network header part (correctly, only up to the bytes that were
>> changed inside) and leave the transport header broken.
>> This also applies to the LL header, if a write spills into the network header
>> then we'll have a broken packet.
>>
>> Would you like me to add additional logic to support the cross-header writes or
>> leave it as it is ?
> 
> I think that should be fine. 
> 
> Regarding the one byte at a time - update, how about rounding down to the
> next multiple of four, adjusting the replacement data accordingly and using
> csum_replace4() instead?
> 
I was hoping you wouldn't say that :-) (joking)
Yep, I know, I'll do it.

Thanks again

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH] netfilter: nf_tables: extend payload to support writing data
  2014-02-19 16:12       ` Nikolay Aleksandrov
@ 2014-02-19 16:25         ` Patrick McHardy
  2014-02-19 16:22           ` Nikolay Aleksandrov
  0 siblings, 1 reply; 10+ messages in thread
From: Patrick McHardy @ 2014-02-19 16:25 UTC (permalink / raw)
  To: Nikolay Aleksandrov; +Cc: netfilter-devel, pablo

On Wed, Feb 19, 2014 at 05:12:13PM +0100, Nikolay Aleksandrov wrote:
> On 02/17/2014 07:46 PM, Patrick McHardy wrote:
> >>>
> >>> We need to take care of checksumming. Shouldn't be too hard to get *most*
> >>> cases right using the existing checksum helpers.
> >>>
> >> Yes, but would you like to do that in here or have a separate op which
> >> is configurable i.e. you can set what to calculate and where to put it,
> >> or would you prefer to make it automatic based on what we're changing here ?
> > 
> > Something here is a lot cheaper since we can use incremental checksumming.
> > That won't be possible somewhere else, additionally we'd have to check
> > the checksum before recalculating it to make sure we don't fix up bad
> > packets. So yes, I think it should be done here.
> > 
> Okay, I've been working on this today and have gotten to a point where it works :)
> But I have a few questions, currently I've made it so the header that you modify
> is the only one that gets its checksum updated (with the exception of pseudo
> header if the address gets written to), I recompute the changed bytes one at a
> time because of the freedom of offset and length. Now this works fine and I can
> mangle the ip/ip6 headers or the transport headers (tcp/udp, I'll look into
> adding sctp as well), but there's a problem with the cross-header writing i.e.
> if a write spills from the network header to the transport header, currently I
> only update the network header part (correctly, only up to the bytes that were
> changed inside) and leave the transport header broken.
> This also applies to the LL header, if a write spills into the network header
> then we'll have a broken packet.
> 
> Would you like me to add additional logic to support the cross-header writes or
> leave it as it is ?

I think that should be fine. 

Regarding the one byte at a time - update, how about rounding down to the
next multiple of four, adjusting the replacement data accordingly and using
csum_replace4() instead?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [RFC PATCH] netfilter: nf_tables: extend payload to support writing data
@ 2014-02-23 17:32 Nikolay Aleksandrov
  2014-02-23 18:09 ` Patrick McHardy
  0 siblings, 1 reply; 10+ messages in thread
From: Nikolay Aleksandrov @ 2014-02-23 17:32 UTC (permalink / raw)
  To: netfilter-devel; +Cc: kaber, pablo, Nikolay Aleksandrov

This patch extends the payload expression to support packet writing.
The new payload attribute - SREG specifies the source register to use
when changing packet data, the rest of the attributes are the same:
base - where to start from
offset - offset in the packet
len - length to write

The DREG attribute should not be set if writing is intended, if both
attributes are set an error will be returned.

The checksum update is done automatically for the following cases:
IPv4 checksum - changing the IPv4 header
TCP checksum - changing addresses in the network header (pseudo) or
               changing TCP header/packet data
UDP checksum - same as TCP
The pseudo header works for both IPv4 and IPv6.

The following restrictions apply:
- Cross-header writing (NH -> TH) won't get their checksum updated
  properly.
- The "checksum" fields of the respective headers should not be altered.
- If altering an address in the network header, the write should not
  alter any other field. (This is okay for IPv4 as the previous field
  is the checksum, but changing the IPv6 "hop limit" and the first bytes
  of the source address should not be done for example)

Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
---
This is _strictly_ RFC, I have some cleanups to do and a ton of tests
to run :-)

Now some explanation, this is the simplest way of doing it that I could
think of. I also have a version which doesn't copy in order to align but
writes directly in "broken" number of bytes i.e. if we had to shift 3 bytes
in order to round offset then on each iteration it'll either write 3 bytes
or 1 byte with heavy use of bitfield manipulation and asynchronous word
counting (i.e. one for the src and one for the dst). I don't believe that
the speed difference will be big so I prefer the simpler approach, if you
have any suggestion I'm willing to change the algorithm :-)

I think the rest is in the commit message, I'll continue to clean it up and
run some more tests. I think I've addressed all comments from the previous
RFC posting.
This patch should apply to Dave's net-next tree.

 include/net/netfilter/nf_tables_core.h   |   1 +
 include/uapi/linux/netfilter/nf_tables.h |   2 +
 net/netfilter/nft_payload.c              | 255 +++++++++++++++++++++++++++++--
 3 files changed, 243 insertions(+), 15 deletions(-)

diff --git a/include/net/netfilter/nf_tables_core.h b/include/net/netfilter/nf_tables_core.h
index cf2b7ae..0a0bd9a 100644
--- a/include/net/netfilter/nf_tables_core.h
+++ b/include/net/netfilter/nf_tables_core.h
@@ -32,6 +32,7 @@ struct nft_payload {
 	u8			offset;
 	u8			len;
 	enum nft_registers	dreg:8;
+	enum nft_registers	sreg:8;
 };
 
 extern const struct nft_expr_ops nft_payload_fast_ops;
diff --git a/include/uapi/linux/netfilter/nf_tables.h b/include/uapi/linux/netfilter/nf_tables.h
index 83c985a..7730a36 100644
--- a/include/uapi/linux/netfilter/nf_tables.h
+++ b/include/uapi/linux/netfilter/nf_tables.h
@@ -480,6 +480,7 @@ enum nft_payload_bases {
 /**
  * enum nft_payload_attributes - nf_tables payload expression netlink attributes
  *
+ * @NFTA_PAYLOAD_SREG: source register to load data from (NLA_U32: nft_registers)
  * @NFTA_PAYLOAD_DREG: destination register to load data into (NLA_U32: nft_registers)
  * @NFTA_PAYLOAD_BASE: payload base (NLA_U32: nft_payload_bases)
  * @NFTA_PAYLOAD_OFFSET: payload offset relative to base (NLA_U32)
@@ -491,6 +492,7 @@ enum nft_payload_attributes {
 	NFTA_PAYLOAD_BASE,
 	NFTA_PAYLOAD_OFFSET,
 	NFTA_PAYLOAD_LEN,
+	NFTA_PAYLOAD_SREG,
 	__NFTA_PAYLOAD_MAX
 };
 #define NFTA_PAYLOAD_MAX	(__NFTA_PAYLOAD_MAX - 1)
diff --git a/net/netfilter/nft_payload.c b/net/netfilter/nft_payload.c
index a2aeb31..d27d4ed 100644
--- a/net/netfilter/nft_payload.c
+++ b/net/netfilter/nft_payload.c
@@ -14,22 +14,173 @@
 #include <linux/netlink.h>
 #include <linux/netfilter.h>
 #include <linux/netfilter/nf_tables.h>
+#include <linux/ip.h>
+#include <linux/ipv6.h>
+#include <linux/inetdevice.h>
 #include <net/netfilter/nf_tables_core.h>
 #include <net/netfilter/nf_tables.h>
+#include <net/ip.h>
 
-static void nft_payload_eval(const struct nft_expr *expr,
-			     struct nft_data data[NFT_REG_MAX + 1],
-			     const struct nft_pktinfo *pkt)
+static u32 __mask_byte(u32 low, u32 high)
+{
+	return GENMASK_ULL(high * BITS_PER_BYTE - 1, low * BITS_PER_BYTE);
+}
+
+/**
+ * __adjust_cksum - adjust a checksum
+ * @sum: the original checksum which is to be adjusted
+ * @old: the original data
+ * @new: the new data
+ * @len: length which is going to change
+ * @off: alignment offset for @new to align with @old in bytes
+ * @alignbuf: pointer to the extra buffer for alignment
+ * @pseudo: calculating a pseudo header
+ *
+ * This function adjusts sum as if the bytes of @old were changed to @new >>
+ * 4 - @off (the shift here is a byte shift). @off is the offset counting from
+ * the LSB, so the offset from the MSB is 4 - off.
+ */
+static void __adjust_cksum(__sum16 *sum, u32 *old, u32 *new, u8 len, u32 off,
+			   u32 *alignbuf, bool pseudo)
+{
+	u32 mask, oldval, newval, written, tmpval, word, tot_words;
+
+	if (WARN_ON(len > 16))
+		return;
+
+	tot_words = roundup(len, 4) / 4;
+	/* align old with new and offset */
+	if (len > off && 4 - off > 0) {
+		memcpy((u8 *)alignbuf + 4 - off, new, len);
+		new = alignbuf;
+		tot_words = roundup(len + 4 - off, 4) / 4;
+	}
+
+	/* Initial bytes to write, either everything or the leftover space */
+	if (len <= off)
+		written = len;
+	else
+		written = off;
+	mask = __mask_byte(off - written, off);
+	for (word = 0; word < tot_words; word++, written += tmpval) {
+		oldval = ntohl(old[word]) & mask;
+		newval = ntohl(new[word]) & mask;
+		if (!pseudo) {
+			csum_replace4(sum, htonl(oldval), htonl(newval));
+		} else {
+			__be32 diff[] = { ~htonl(oldval), htonl(newval) };
+
+			*sum = ~csum_fold(csum_partial(diff, sizeof(diff),
+					  csum_unfold(*sum)));
+		}
+		tmpval = len - written > 4 ? 4 : len - written;
+		mask = __mask_byte(4 - tmpval, 4);
+	}
+}
+
+static void nft_payload_fix_cksum(const struct nft_pktinfo *pkt,
+				  const struct nft_payload *priv,
+				  struct nft_data *source,
+				  int offset)
+{
+	u32 byteoff, *oldp, chkoff, alignbuf[5];
+	struct sk_buff *skb = pkt->skb;
+	__sum16 *chkp = NULL;
+	struct tcphdr *tcph;
+	struct udphdr *udph;
+	struct iphdr *iph;
+	bool pseudo = 0;
+	int nhoff;
+
+	if (pkt->xt.family != NFPROTO_IPV4 &&
+	    pkt->xt.family != NFPROTO_IPV6)
+		return;
+
+	nhoff = skb_network_offset(skb);
+	switch (priv->base) {
+	case NFT_PAYLOAD_LL_HEADER:
+		/* Entirely in LL header, no need to update */
+		if (offset + priv->len < nhoff)
+			break;
+	case NFT_PAYLOAD_NETWORK_HEADER:
+		if (offset >= pkt->xt.thoff)
+			goto do_th;
+		oldp = (u32 *)(skb->data + rounddown(offset, 4));
+		byteoff = 4 - (offset - rounddown(offset, 4));
+		if (pkt->xt.family == NFPROTO_IPV4) {
+			/* Make sure we can update the checksum */
+			iph = ip_hdr(skb);
+			chkoff = nhoff + offsetof(struct iphdr, check) +
+				 sizeof(iph->check);
+			if (offset + priv->len < chkoff &&
+			    !skb_make_writable(skb, chkoff))
+				break;
+			if (offset >= nhoff + offsetof(struct iphdr, saddr))
+				pseudo = 1;
+
+			__adjust_cksum(&iph->check, oldp, &source->data[0],
+				       priv->len, byteoff, alignbuf, 0);
+		} else {
+			if (offset >= nhoff + offsetof(struct ipv6hdr, saddr))
+				pseudo = 1;
+		}
+		if (!pseudo)
+			break;
+do_th:
+	case NFT_PAYLOAD_TRANSPORT_HEADER:
+			if (pkt->tprot == IPPROTO_TCP) {
+				tcph = (struct tcphdr *)(skb->data +
+							 pkt->xt.thoff);
+				chkoff = pkt->xt.thoff +
+					 offsetof(struct tcphdr, check) +
+					 sizeof(tcph->check);
+				chkp = &tcph->check;
+			} else if (pkt->tprot == IPPROTO_UDP) {
+				udph = (struct udphdr *)(skb->data +
+							 pkt->xt.thoff);
+				chkoff = pkt->xt.thoff +
+					 offsetof(struct udphdr, check) +
+					 sizeof(udph->check);
+				chkp = &udph->check;
+			}
+			/* Check if we have a valid protocol */
+			if (!chkp)
+				break;
+
+			/* Make sure we can update the checksum */
+			if (offset + priv->len < chkoff &&
+			    !skb_make_writable(skb, chkoff))
+				break;
+
+			/* We're here because the address got changed, update
+			 * only that as we don't allow cross-header writing.
+			 */
+			if (pseudo) {
+				__adjust_cksum(chkp, oldp, &source->data[0],
+					       priv->len, byteoff, alignbuf, 1);
+				break;
+			}
+
+			oldp = (u32 *)(skb->data + rounddown(priv->offset, 4));
+			byteoff = 4 - (offset - rounddown(priv->offset, 4));
+			__adjust_cksum(chkp, oldp, &source->data[0], priv->len,
+				       byteoff, alignbuf, 0);
+		break;
+	default:
+		break;
+	}
+}
+
+static int nft_payload_make_offset(const struct nft_pktinfo *pkt,
+				   const struct nft_payload *priv)
 {
-	const struct nft_payload *priv = nft_expr_priv(expr);
 	const struct sk_buff *skb = pkt->skb;
-	struct nft_data *dest = &data[priv->dreg];
-	int offset;
+	int offset = -1;
 
 	switch (priv->base) {
 	case NFT_PAYLOAD_LL_HEADER:
 		if (!skb_mac_header_was_set(skb))
-			goto err;
+			return offset;
 		offset = skb_mac_header(skb) - skb->data;
 		break;
 	case NFT_PAYLOAD_NETWORK_HEADER:
@@ -43,14 +194,49 @@ static void nft_payload_eval(const struct nft_expr *expr,
 	}
 	offset += priv->offset;
 
-	if (skb_copy_bits(skb, offset, dest->data, priv->len) < 0)
+	return offset;
+}
+
+static void nft_payload_eval(const struct nft_expr *expr,
+			     struct nft_data data[NFT_REG_MAX + 1],
+			     const struct nft_pktinfo *pkt)
+{
+	const struct nft_payload *priv = nft_expr_priv(expr);
+	struct nft_data *dest = &data[priv->dreg];
+	int offset = nft_payload_make_offset(pkt, priv);
+
+	if (offset == -1)
+		goto err;
+	if (skb_copy_bits(pkt->skb, offset, dest->data, priv->len) < 0)
+		goto err;
+	return;
+err:
+	data[NFT_REG_VERDICT].verdict = NFT_BREAK;
+}
+
+static void nft_payload_set_eval(const struct nft_expr *expr,
+				 struct nft_data data[NFT_REG_MAX + 1],
+				 const struct nft_pktinfo *pkt)
+{
+	const struct nft_payload *priv = nft_expr_priv(expr);
+	struct nft_data *source = &data[priv->sreg];
+	int offset = nft_payload_make_offset(pkt, priv);
+
+	if (offset == -1)
+		goto err;
+	if (!skb_make_writable(pkt->skb, offset + priv->len))
 		goto err;
+
+	nft_payload_fix_cksum(pkt, priv, source, offset);
+	memcpy(pkt->skb->data + offset, source, priv->len);
+
 	return;
 err:
 	data[NFT_REG_VERDICT].verdict = NFT_BREAK;
 }
 
 static const struct nla_policy nft_payload_policy[NFTA_PAYLOAD_MAX + 1] = {
+	[NFTA_PAYLOAD_SREG]	= { .type = NLA_U32 },
 	[NFTA_PAYLOAD_DREG]	= { .type = NLA_U32 },
 	[NFTA_PAYLOAD_BASE]	= { .type = NLA_U32 },
 	[NFTA_PAYLOAD_OFFSET]	= { .type = NLA_U32 },
@@ -75,12 +261,35 @@ static int nft_payload_init(const struct nft_ctx *ctx,
 	return nft_validate_data_load(ctx, priv->dreg, NULL, NFT_DATA_VALUE);
 }
 
+static int nft_payload_set_init(const struct nft_ctx *ctx,
+				const struct nft_expr *expr,
+				const struct nlattr * const tb[])
+{
+	struct nft_payload *priv = nft_expr_priv(expr);
+	int err;
+
+	priv->base   = ntohl(nla_get_be32(tb[NFTA_PAYLOAD_BASE]));
+	priv->offset = ntohl(nla_get_be32(tb[NFTA_PAYLOAD_OFFSET]));
+	priv->len    = ntohl(nla_get_be32(tb[NFTA_PAYLOAD_LEN]));
+
+	priv->sreg = ntohl(nla_get_be32(tb[NFTA_PAYLOAD_SREG]));
+	err = nft_validate_input_register(priv->sreg);
+
+	return err;
+}
+
 static int nft_payload_dump(struct sk_buff *skb, const struct nft_expr *expr)
 {
 	const struct nft_payload *priv = nft_expr_priv(expr);
 
-	if (nla_put_be32(skb, NFTA_PAYLOAD_DREG, htonl(priv->dreg)) ||
-	    nla_put_be32(skb, NFTA_PAYLOAD_BASE, htonl(priv->base)) ||
+	if (priv->sreg) {
+		if (nla_put_be32(skb, NFTA_PAYLOAD_SREG, htonl(priv->sreg)))
+			goto nla_put_failure;
+	} else {
+		if (nla_put_be32(skb, NFTA_PAYLOAD_DREG, htonl(priv->dreg)))
+			goto nla_put_failure;
+	}
+	if (nla_put_be32(skb, NFTA_PAYLOAD_BASE, htonl(priv->base)) ||
 	    nla_put_be32(skb, NFTA_PAYLOAD_OFFSET, htonl(priv->offset)) ||
 	    nla_put_be32(skb, NFTA_PAYLOAD_LEN, htonl(priv->len)))
 		goto nla_put_failure;
@@ -107,6 +316,14 @@ const struct nft_expr_ops nft_payload_fast_ops = {
 	.dump		= nft_payload_dump,
 };
 
+static const struct nft_expr_ops nft_payload_set_ops = {
+	.type		= &nft_payload_type,
+	.size		= NFT_EXPR_SIZE(sizeof(struct nft_payload)),
+	.eval		= nft_payload_set_eval,
+	.init		= nft_payload_set_init,
+	.dump		= nft_payload_dump,
+};
+
 static const struct nft_expr_ops *
 nft_payload_select_ops(const struct nft_ctx *ctx,
 		       const struct nlattr * const tb[])
@@ -114,11 +331,14 @@ nft_payload_select_ops(const struct nft_ctx *ctx,
 	enum nft_payload_bases base;
 	unsigned int offset, len;
 
-	if (tb[NFTA_PAYLOAD_DREG] == NULL ||
+	if ((tb[NFTA_PAYLOAD_SREG] == NULL &&
+	    tb[NFTA_PAYLOAD_DREG] == NULL) ||
 	    tb[NFTA_PAYLOAD_BASE] == NULL ||
 	    tb[NFTA_PAYLOAD_OFFSET] == NULL ||
 	    tb[NFTA_PAYLOAD_LEN] == NULL)
 		return ERR_PTR(-EINVAL);
+	if (tb[NFTA_PAYLOAD_SREG] && tb[NFTA_PAYLOAD_DREG])
+		return ERR_PTR(-EINVAL);
 
 	base = ntohl(nla_get_be32(tb[NFTA_PAYLOAD_BASE]));
 	switch (base) {
@@ -135,10 +355,15 @@ nft_payload_select_ops(const struct nft_ctx *ctx,
 	if (len == 0 || len > FIELD_SIZEOF(struct nft_data, data))
 		return ERR_PTR(-EINVAL);
 
-	if (len <= 4 && IS_ALIGNED(offset, len) && base != NFT_PAYLOAD_LL_HEADER)
-		return &nft_payload_fast_ops;
-	else
-		return &nft_payload_ops;
+	if (tb[NFTA_PAYLOAD_SREG]) {
+		return &nft_payload_set_ops;
+	} else {
+		if (len <= 4 && IS_ALIGNED(offset, len) &&
+		    base != NFT_PAYLOAD_LL_HEADER)
+			return &nft_payload_fast_ops;
+		else
+			return &nft_payload_ops;
+	}
 }
 
 static struct nft_expr_type nft_payload_type __read_mostly = {
-- 
1.8.4.2


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH] netfilter: nf_tables: extend payload to support writing data
  2014-02-23 17:32 Nikolay Aleksandrov
@ 2014-02-23 18:09 ` Patrick McHardy
  2014-02-23 18:34   ` Nikolay Aleksandrov
  0 siblings, 1 reply; 10+ messages in thread
From: Patrick McHardy @ 2014-02-23 18:09 UTC (permalink / raw)
  To: Nikolay Aleksandrov; +Cc: netfilter-devel, pablo

On Sun, Feb 23, 2014 at 06:32:22PM +0100, Nikolay Aleksandrov wrote:
> This patch extends the payload expression to support packet writing.
> The new payload attribute - SREG specifies the source register to use
> when changing packet data, the rest of the attributes are the same:
> base - where to start from
> offset - offset in the packet
> len - length to write
> 
> The DREG attribute should not be set if writing is intended, if both
> attributes are set an error will be returned.
> 
> The checksum update is done automatically for the following cases:
> IPv4 checksum - changing the IPv4 header
> TCP checksum - changing addresses in the network header (pseudo) or
>                changing TCP header/packet data
> UDP checksum - same as TCP
> The pseudo header works for both IPv4 and IPv6.
> 
> The following restrictions apply:
> - Cross-header writing (NH -> TH) won't get their checksum updated
>   properly.
> - The "checksum" fields of the respective headers should not be altered.
> - If altering an address in the network header, the write should not
>   alter any other field. (This is okay for IPv4 as the previous field
>   is the checksum, but changing the IPv6 "hop limit" and the first bytes
>   of the source address should not be done for example)
> 
> Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
> ---
> This is _strictly_ RFC, I have some cleanups to do and a ton of tests
> to run :-)

Thanks for your efforts so far. However regarding checksumming, encoding
all this protocol knowledge into the kernel is against the concepts we
have so far and also kind of unncessary, userspace has this knowledge
anyways. So I'd propose to change the checksumming in the following way:

- userspace specifies a checksum offset (header base and offset)
- userspace specifies a checksum type (CSUM_TYPE_INET or something like this)
- userspace specifies whether the change affects a pseudo header
- userspace specifies the pseudo offset if required

The checksum update can then simply be done one or two calls to
inet_proto_csum_replaceX(). Userspace is responsible for doing updates
in steps that result in valid checksums (IOW, don't mix updates that
affect the pseudo header with other updates).

> Now some explanation, this is the simplest way of doing it that I could
> think of. I also have a version which doesn't copy in order to align but
> writes directly in "broken" number of bytes i.e. if we had to shift 3 bytes
> in order to round offset then on each iteration it'll either write 3 bytes
> or 1 byte with heavy use of bitfield manipulation and asynchronous word
> counting (i.e. one for the src and one for the dst). I don't believe that
> the speed difference will be big so I prefer the simpler approach, if you
> have any suggestion I'm willing to change the algorithm :-)
> 
> I think the rest is in the commit message, I'll continue to clean it up and
> run some more tests. I think I've addressed all comments from the previous
> RFC posting.
> This patch should apply to Dave's net-next tree.
> 
>  include/net/netfilter/nf_tables_core.h   |   1 +
>  include/uapi/linux/netfilter/nf_tables.h |   2 +
>  net/netfilter/nft_payload.c              | 255 +++++++++++++++++++++++++++++--
>  3 files changed, 243 insertions(+), 15 deletions(-)
> 
> diff --git a/include/net/netfilter/nf_tables_core.h b/include/net/netfilter/nf_tables_core.h
> index cf2b7ae..0a0bd9a 100644
> --- a/include/net/netfilter/nf_tables_core.h
> +++ b/include/net/netfilter/nf_tables_core.h
> @@ -32,6 +32,7 @@ struct nft_payload {
>  	u8			offset;
>  	u8			len;
>  	enum nft_registers	dreg:8;
> +	enum nft_registers	sreg:8;
>  };
>  
>  extern const struct nft_expr_ops nft_payload_fast_ops;
> diff --git a/include/uapi/linux/netfilter/nf_tables.h b/include/uapi/linux/netfilter/nf_tables.h
> index 83c985a..7730a36 100644
> --- a/include/uapi/linux/netfilter/nf_tables.h
> +++ b/include/uapi/linux/netfilter/nf_tables.h
> @@ -480,6 +480,7 @@ enum nft_payload_bases {
>  /**
>   * enum nft_payload_attributes - nf_tables payload expression netlink attributes
>   *
> + * @NFTA_PAYLOAD_SREG: source register to load data from (NLA_U32: nft_registers)
>   * @NFTA_PAYLOAD_DREG: destination register to load data into (NLA_U32: nft_registers)
>   * @NFTA_PAYLOAD_BASE: payload base (NLA_U32: nft_payload_bases)
>   * @NFTA_PAYLOAD_OFFSET: payload offset relative to base (NLA_U32)
> @@ -491,6 +492,7 @@ enum nft_payload_attributes {
>  	NFTA_PAYLOAD_BASE,
>  	NFTA_PAYLOAD_OFFSET,
>  	NFTA_PAYLOAD_LEN,
> +	NFTA_PAYLOAD_SREG,
>  	__NFTA_PAYLOAD_MAX
>  };
>  #define NFTA_PAYLOAD_MAX	(__NFTA_PAYLOAD_MAX - 1)
> diff --git a/net/netfilter/nft_payload.c b/net/netfilter/nft_payload.c
> index a2aeb31..d27d4ed 100644
> --- a/net/netfilter/nft_payload.c
> +++ b/net/netfilter/nft_payload.c
> @@ -14,22 +14,173 @@
>  #include <linux/netlink.h>
>  #include <linux/netfilter.h>
>  #include <linux/netfilter/nf_tables.h>
> +#include <linux/ip.h>
> +#include <linux/ipv6.h>
> +#include <linux/inetdevice.h>
>  #include <net/netfilter/nf_tables_core.h>
>  #include <net/netfilter/nf_tables.h>
> +#include <net/ip.h>
>  
> -static void nft_payload_eval(const struct nft_expr *expr,
> -			     struct nft_data data[NFT_REG_MAX + 1],
> -			     const struct nft_pktinfo *pkt)
> +static u32 __mask_byte(u32 low, u32 high)
> +{
> +	return GENMASK_ULL(high * BITS_PER_BYTE - 1, low * BITS_PER_BYTE);
> +}
> +
> +/**
> + * __adjust_cksum - adjust a checksum
> + * @sum: the original checksum which is to be adjusted
> + * @old: the original data
> + * @new: the new data
> + * @len: length which is going to change
> + * @off: alignment offset for @new to align with @old in bytes
> + * @alignbuf: pointer to the extra buffer for alignment
> + * @pseudo: calculating a pseudo header
> + *
> + * This function adjusts sum as if the bytes of @old were changed to @new >>
> + * 4 - @off (the shift here is a byte shift). @off is the offset counting from
> + * the LSB, so the offset from the MSB is 4 - off.
> + */
> +static void __adjust_cksum(__sum16 *sum, u32 *old, u32 *new, u8 len, u32 off,
> +			   u32 *alignbuf, bool pseudo)
> +{
> +	u32 mask, oldval, newval, written, tmpval, word, tot_words;
> +
> +	if (WARN_ON(len > 16))
> +		return;
> +
> +	tot_words = roundup(len, 4) / 4;
> +	/* align old with new and offset */
> +	if (len > off && 4 - off > 0) {
> +		memcpy((u8 *)alignbuf + 4 - off, new, len);
> +		new = alignbuf;
> +		tot_words = roundup(len + 4 - off, 4) / 4;
> +	}
> +
> +	/* Initial bytes to write, either everything or the leftover space */
> +	if (len <= off)
> +		written = len;
> +	else
> +		written = off;
> +	mask = __mask_byte(off - written, off);
> +	for (word = 0; word < tot_words; word++, written += tmpval) {
> +		oldval = ntohl(old[word]) & mask;
> +		newval = ntohl(new[word]) & mask;
> +		if (!pseudo) {
> +			csum_replace4(sum, htonl(oldval), htonl(newval));
> +		} else {
> +			__be32 diff[] = { ~htonl(oldval), htonl(newval) };
> +
> +			*sum = ~csum_fold(csum_partial(diff, sizeof(diff),
> +					  csum_unfold(*sum)));
> +		}
> +		tmpval = len - written > 4 ? 4 : len - written;
> +		mask = __mask_byte(4 - tmpval, 4);
> +	}
> +}
> +
> +static void nft_payload_fix_cksum(const struct nft_pktinfo *pkt,
> +				  const struct nft_payload *priv,
> +				  struct nft_data *source,
> +				  int offset)
> +{
> +	u32 byteoff, *oldp, chkoff, alignbuf[5];
> +	struct sk_buff *skb = pkt->skb;
> +	__sum16 *chkp = NULL;
> +	struct tcphdr *tcph;
> +	struct udphdr *udph;
> +	struct iphdr *iph;
> +	bool pseudo = 0;
> +	int nhoff;
> +
> +	if (pkt->xt.family != NFPROTO_IPV4 &&
> +	    pkt->xt.family != NFPROTO_IPV6)
> +		return;
> +
> +	nhoff = skb_network_offset(skb);
> +	switch (priv->base) {
> +	case NFT_PAYLOAD_LL_HEADER:
> +		/* Entirely in LL header, no need to update */
> +		if (offset + priv->len < nhoff)
> +			break;
> +	case NFT_PAYLOAD_NETWORK_HEADER:
> +		if (offset >= pkt->xt.thoff)
> +			goto do_th;
> +		oldp = (u32 *)(skb->data + rounddown(offset, 4));
> +		byteoff = 4 - (offset - rounddown(offset, 4));
> +		if (pkt->xt.family == NFPROTO_IPV4) {
> +			/* Make sure we can update the checksum */
> +			iph = ip_hdr(skb);
> +			chkoff = nhoff + offsetof(struct iphdr, check) +
> +				 sizeof(iph->check);
> +			if (offset + priv->len < chkoff &&
> +			    !skb_make_writable(skb, chkoff))
> +				break;
> +			if (offset >= nhoff + offsetof(struct iphdr, saddr))
> +				pseudo = 1;
> +
> +			__adjust_cksum(&iph->check, oldp, &source->data[0],
> +				       priv->len, byteoff, alignbuf, 0);
> +		} else {
> +			if (offset >= nhoff + offsetof(struct ipv6hdr, saddr))
> +				pseudo = 1;
> +		}
> +		if (!pseudo)
> +			break;
> +do_th:
> +	case NFT_PAYLOAD_TRANSPORT_HEADER:
> +			if (pkt->tprot == IPPROTO_TCP) {
> +				tcph = (struct tcphdr *)(skb->data +
> +							 pkt->xt.thoff);
> +				chkoff = pkt->xt.thoff +
> +					 offsetof(struct tcphdr, check) +
> +					 sizeof(tcph->check);
> +				chkp = &tcph->check;
> +			} else if (pkt->tprot == IPPROTO_UDP) {
> +				udph = (struct udphdr *)(skb->data +
> +							 pkt->xt.thoff);
> +				chkoff = pkt->xt.thoff +
> +					 offsetof(struct udphdr, check) +
> +					 sizeof(udph->check);
> +				chkp = &udph->check;
> +			}
> +			/* Check if we have a valid protocol */
> +			if (!chkp)
> +				break;
> +
> +			/* Make sure we can update the checksum */
> +			if (offset + priv->len < chkoff &&
> +			    !skb_make_writable(skb, chkoff))
> +				break;
> +
> +			/* We're here because the address got changed, update
> +			 * only that as we don't allow cross-header writing.
> +			 */
> +			if (pseudo) {
> +				__adjust_cksum(chkp, oldp, &source->data[0],
> +					       priv->len, byteoff, alignbuf, 1);
> +				break;
> +			}
> +
> +			oldp = (u32 *)(skb->data + rounddown(priv->offset, 4));
> +			byteoff = 4 - (offset - rounddown(priv->offset, 4));
> +			__adjust_cksum(chkp, oldp, &source->data[0], priv->len,
> +				       byteoff, alignbuf, 0);
> +		break;
> +	default:
> +		break;
> +	}
> +}
> +
> +static int nft_payload_make_offset(const struct nft_pktinfo *pkt,
> +				   const struct nft_payload *priv)
>  {
> -	const struct nft_payload *priv = nft_expr_priv(expr);
>  	const struct sk_buff *skb = pkt->skb;
> -	struct nft_data *dest = &data[priv->dreg];
> -	int offset;
> +	int offset = -1;
>  
>  	switch (priv->base) {
>  	case NFT_PAYLOAD_LL_HEADER:
>  		if (!skb_mac_header_was_set(skb))
> -			goto err;
> +			return offset;
>  		offset = skb_mac_header(skb) - skb->data;
>  		break;
>  	case NFT_PAYLOAD_NETWORK_HEADER:
> @@ -43,14 +194,49 @@ static void nft_payload_eval(const struct nft_expr *expr,
>  	}
>  	offset += priv->offset;
>  
> -	if (skb_copy_bits(skb, offset, dest->data, priv->len) < 0)
> +	return offset;
> +}
> +
> +static void nft_payload_eval(const struct nft_expr *expr,
> +			     struct nft_data data[NFT_REG_MAX + 1],
> +			     const struct nft_pktinfo *pkt)
> +{
> +	const struct nft_payload *priv = nft_expr_priv(expr);
> +	struct nft_data *dest = &data[priv->dreg];
> +	int offset = nft_payload_make_offset(pkt, priv);
> +
> +	if (offset == -1)
> +		goto err;
> +	if (skb_copy_bits(pkt->skb, offset, dest->data, priv->len) < 0)
> +		goto err;
> +	return;
> +err:
> +	data[NFT_REG_VERDICT].verdict = NFT_BREAK;
> +}
> +
> +static void nft_payload_set_eval(const struct nft_expr *expr,
> +				 struct nft_data data[NFT_REG_MAX + 1],
> +				 const struct nft_pktinfo *pkt)
> +{
> +	const struct nft_payload *priv = nft_expr_priv(expr);
> +	struct nft_data *source = &data[priv->sreg];
> +	int offset = nft_payload_make_offset(pkt, priv);
> +
> +	if (offset == -1)
> +		goto err;
> +	if (!skb_make_writable(pkt->skb, offset + priv->len))
>  		goto err;
> +
> +	nft_payload_fix_cksum(pkt, priv, source, offset);
> +	memcpy(pkt->skb->data + offset, source, priv->len);
> +
>  	return;
>  err:
>  	data[NFT_REG_VERDICT].verdict = NFT_BREAK;
>  }
>  
>  static const struct nla_policy nft_payload_policy[NFTA_PAYLOAD_MAX + 1] = {
> +	[NFTA_PAYLOAD_SREG]	= { .type = NLA_U32 },
>  	[NFTA_PAYLOAD_DREG]	= { .type = NLA_U32 },
>  	[NFTA_PAYLOAD_BASE]	= { .type = NLA_U32 },
>  	[NFTA_PAYLOAD_OFFSET]	= { .type = NLA_U32 },
> @@ -75,12 +261,35 @@ static int nft_payload_init(const struct nft_ctx *ctx,
>  	return nft_validate_data_load(ctx, priv->dreg, NULL, NFT_DATA_VALUE);
>  }
>  
> +static int nft_payload_set_init(const struct nft_ctx *ctx,
> +				const struct nft_expr *expr,
> +				const struct nlattr * const tb[])
> +{
> +	struct nft_payload *priv = nft_expr_priv(expr);
> +	int err;
> +
> +	priv->base   = ntohl(nla_get_be32(tb[NFTA_PAYLOAD_BASE]));
> +	priv->offset = ntohl(nla_get_be32(tb[NFTA_PAYLOAD_OFFSET]));
> +	priv->len    = ntohl(nla_get_be32(tb[NFTA_PAYLOAD_LEN]));
> +
> +	priv->sreg = ntohl(nla_get_be32(tb[NFTA_PAYLOAD_SREG]));
> +	err = nft_validate_input_register(priv->sreg);
> +
> +	return err;
> +}
> +
>  static int nft_payload_dump(struct sk_buff *skb, const struct nft_expr *expr)
>  {
>  	const struct nft_payload *priv = nft_expr_priv(expr);
>  
> -	if (nla_put_be32(skb, NFTA_PAYLOAD_DREG, htonl(priv->dreg)) ||
> -	    nla_put_be32(skb, NFTA_PAYLOAD_BASE, htonl(priv->base)) ||
> +	if (priv->sreg) {
> +		if (nla_put_be32(skb, NFTA_PAYLOAD_SREG, htonl(priv->sreg)))
> +			goto nla_put_failure;
> +	} else {
> +		if (nla_put_be32(skb, NFTA_PAYLOAD_DREG, htonl(priv->dreg)))
> +			goto nla_put_failure;
> +	}
> +	if (nla_put_be32(skb, NFTA_PAYLOAD_BASE, htonl(priv->base)) ||
>  	    nla_put_be32(skb, NFTA_PAYLOAD_OFFSET, htonl(priv->offset)) ||
>  	    nla_put_be32(skb, NFTA_PAYLOAD_LEN, htonl(priv->len)))
>  		goto nla_put_failure;
> @@ -107,6 +316,14 @@ const struct nft_expr_ops nft_payload_fast_ops = {
>  	.dump		= nft_payload_dump,
>  };
>  
> +static const struct nft_expr_ops nft_payload_set_ops = {
> +	.type		= &nft_payload_type,
> +	.size		= NFT_EXPR_SIZE(sizeof(struct nft_payload)),
> +	.eval		= nft_payload_set_eval,
> +	.init		= nft_payload_set_init,
> +	.dump		= nft_payload_dump,
> +};
> +
>  static const struct nft_expr_ops *
>  nft_payload_select_ops(const struct nft_ctx *ctx,
>  		       const struct nlattr * const tb[])
> @@ -114,11 +331,14 @@ nft_payload_select_ops(const struct nft_ctx *ctx,
>  	enum nft_payload_bases base;
>  	unsigned int offset, len;
>  
> -	if (tb[NFTA_PAYLOAD_DREG] == NULL ||
> +	if ((tb[NFTA_PAYLOAD_SREG] == NULL &&
> +	    tb[NFTA_PAYLOAD_DREG] == NULL) ||
>  	    tb[NFTA_PAYLOAD_BASE] == NULL ||
>  	    tb[NFTA_PAYLOAD_OFFSET] == NULL ||
>  	    tb[NFTA_PAYLOAD_LEN] == NULL)
>  		return ERR_PTR(-EINVAL);
> +	if (tb[NFTA_PAYLOAD_SREG] && tb[NFTA_PAYLOAD_DREG])
> +		return ERR_PTR(-EINVAL);
>  
>  	base = ntohl(nla_get_be32(tb[NFTA_PAYLOAD_BASE]));
>  	switch (base) {
> @@ -135,10 +355,15 @@ nft_payload_select_ops(const struct nft_ctx *ctx,
>  	if (len == 0 || len > FIELD_SIZEOF(struct nft_data, data))
>  		return ERR_PTR(-EINVAL);
>  
> -	if (len <= 4 && IS_ALIGNED(offset, len) && base != NFT_PAYLOAD_LL_HEADER)
> -		return &nft_payload_fast_ops;
> -	else
> -		return &nft_payload_ops;
> +	if (tb[NFTA_PAYLOAD_SREG]) {
> +		return &nft_payload_set_ops;
> +	} else {
> +		if (len <= 4 && IS_ALIGNED(offset, len) &&
> +		    base != NFT_PAYLOAD_LL_HEADER)
> +			return &nft_payload_fast_ops;
> +		else
> +			return &nft_payload_ops;
> +	}
>  }
>  
>  static struct nft_expr_type nft_payload_type __read_mostly = {
> -- 
> 1.8.4.2

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH] netfilter: nf_tables: extend payload to support writing data
  2014-02-23 18:09 ` Patrick McHardy
@ 2014-02-23 18:34   ` Nikolay Aleksandrov
  0 siblings, 0 replies; 10+ messages in thread
From: Nikolay Aleksandrov @ 2014-02-23 18:34 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: netfilter-devel, pablo

On 02/23/2014 07:09 PM, Patrick McHardy wrote:
> On Sun, Feb 23, 2014 at 06:32:22PM +0100, Nikolay Aleksandrov wrote:
>> This patch extends the payload expression to support packet writing.
>> The new payload attribute - SREG specifies the source register to use
>> when changing packet data, the rest of the attributes are the same:
>> base - where to start from
>> offset - offset in the packet
>> len - length to write
>>
>> The DREG attribute should not be set if writing is intended, if both
>> attributes are set an error will be returned.
>>
>> The checksum update is done automatically for the following cases:
>> IPv4 checksum - changing the IPv4 header
>> TCP checksum - changing addresses in the network header (pseudo) or
>>                changing TCP header/packet data
>> UDP checksum - same as TCP
>> The pseudo header works for both IPv4 and IPv6.
>>
>> The following restrictions apply:
>> - Cross-header writing (NH -> TH) won't get their checksum updated
>>   properly.
>> - The "checksum" fields of the respective headers should not be altered.
>> - If altering an address in the network header, the write should not
>>   alter any other field. (This is okay for IPv4 as the previous field
>>   is the checksum, but changing the IPv6 "hop limit" and the first bytes
>>   of the source address should not be done for example)
>>
>> Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
>> ---
>> This is _strictly_ RFC, I have some cleanups to do and a ton of tests
>> to run :-)
> 
> Thanks for your efforts so far. However regarding checksumming, encoding
> all this protocol knowledge into the kernel is against the concepts we
> have so far and also kind of unncessary, userspace has this knowledge
> anyways. So I'd propose to change the checksumming in the following way:
> 
> - userspace specifies a checksum offset (header base and offset)
> - userspace specifies a checksum type (CSUM_TYPE_INET or something like this)
> - userspace specifies whether the change affects a pseudo header
> - userspace specifies the pseudo offset if required
> 
> The checksum update can then simply be done one or two calls to
> inet_proto_csum_replaceX(). Userspace is responsible for doing updates
> in steps that result in valid checksums (IOW, don't mix updates that
> affect the pseudo header with other updates).
> 
Heh, I've misunderstood you the last time apparently and tried to do it all
automagically or to cover as much as possible cases.
Anyway, given that userspace specifies all of the above as you said it
should be nearly trivial to update the checksum properly.
I'll get to it next week.

Again thanks for the feedback,
 Nik


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2014-02-23 18:34 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-02-17 18:12 [RFC PATCH] netfilter: nf_tables: extend payload to support writing data Nikolay Aleksandrov
2014-02-17 18:37 ` Patrick McHardy
2014-02-17 18:43   ` Nikolay Aleksandrov
2014-02-17 18:46     ` Patrick McHardy
2014-02-19 16:12       ` Nikolay Aleksandrov
2014-02-19 16:25         ` Patrick McHardy
2014-02-19 16:22           ` Nikolay Aleksandrov
  -- strict thread matches above, loose matches on Subject: below --
2014-02-23 17:32 Nikolay Aleksandrov
2014-02-23 18:09 ` Patrick McHardy
2014-02-23 18:34   ` Nikolay Aleksandrov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).