[patch net-next 0/3] couple of reasm fixes

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [patch net-next 0/3] couple of reasm fixes
@ 2013-11-05 11:02 Jiri Pirko
  2013-11-05 11:02 ` [patch net-next 1/3] move skb_nfct_reasm into skbuff.h Jiri Pirko
                   ` (2 more replies)
  0 siblings, 3 replies; 25+ messages in thread
From: Jiri Pirko @ 2013-11-05 11:02 UTC (permalink / raw)
  To: netdev
  Cc: davem, pablo, netfilter-devel, yoshfuji, kadlec, kaber, mleitner,
	kuznet, jmorris, wensong, horms, ja, edumazet, pshelar, jasowang,
	alexander.h.duyck, coreteam, fw

Couple of patches to fix issues which were found on
TPROXY vs. IPV6 frag packets use cases.

Jiri Pirko (3):
  move skb_nfct_reasm into skbuff.h
  netfilter: ip6_tables: use reasm skb for matching
  fix skb_morph to preserve skb->sk and skb->destructor pointers

 include/linux/skbuff.h          | 11 +++++++++++
 include/net/ip_vs.h             |  8 --------
 net/core/skbuff.c               | 44 +++++++++++++++++++++++++----------------
 net/ipv6/netfilter/ip6_tables.c |  8 ++++++--
 net/netfilter/ipvs/ip_vs_core.c |  1 +
 5 files changed, 45 insertions(+), 27 deletions(-)

-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [patch net-next 1/3] move skb_nfct_reasm into skbuff.h
  2013-11-05 11:02 [patch net-next 0/3] couple of reasm fixes Jiri Pirko
@ 2013-11-05 11:02 ` Jiri Pirko
  2013-11-05 11:50   ` Marcelo Ricardo Leitner
  2013-11-06  1:00   ` Simon Horman
  2013-11-05 11:02 ` [patch net-next 2/3] netfilter: ip6_tables: use reasm skb for matching Jiri Pirko
  2013-11-05 11:02 ` [patch net-next 3/3] fix skb_morph to preserve skb->sk and skb->destructor pointers Jiri Pirko
  2 siblings, 2 replies; 25+ messages in thread
From: Jiri Pirko @ 2013-11-05 11:02 UTC (permalink / raw)
  To: netdev
  Cc: davem, pablo, netfilter-devel, yoshfuji, kadlec, kaber, mleitner,
	kuznet, jmorris, wensong, horms, ja, edumazet, pshelar, jasowang,
	alexander.h.duyck, coreteam, fw

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
---
 include/linux/skbuff.h          | 11 +++++++++++
 include/net/ip_vs.h             |  8 --------
 net/netfilter/ipvs/ip_vs_core.c |  1 +
 3 files changed, 12 insertions(+), 8 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 2e153b6..ececdad 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -2606,6 +2606,17 @@ static inline void nf_conntrack_put_reasm(struct sk_buff *skb)
 		kfree_skb(skb);
 }
 #endif
+#ifdef NET_SKBUFF_NF_DEFRAG_NEEDED
+static inline struct sk_buff *skb_nfct_reasm(const struct sk_buff *skb)
+{
+	return skb->nfct_reasm;
+}
+#else
+static inline struct sk_buff *skb_nfct_reasm(const struct sk_buff *skb)
+{
+	return NULL;
+}
+#endif
 #ifdef CONFIG_BRIDGE_NETFILTER
 static inline void nf_bridge_put(struct nf_bridge_info *nf_bridge)
 {
diff --git a/include/net/ip_vs.h b/include/net/ip_vs.h
index cd7275f..6dff2b6 100644
--- a/include/net/ip_vs.h
+++ b/include/net/ip_vs.h
@@ -119,10 +119,6 @@ struct ip_vs_iphdr {
 
 /* Dependency to module: nf_defrag_ipv6 */
 #if defined(CONFIG_NF_DEFRAG_IPV6) || defined(CONFIG_NF_DEFRAG_IPV6_MODULE)
-static inline struct sk_buff *skb_nfct_reasm(const struct sk_buff *skb)
-{
-	return skb->nfct_reasm;
-}
 static inline void *frag_safe_skb_hp(const struct sk_buff *skb, int offset,
 				      int len, void *buffer,
 				      const struct ip_vs_iphdr *ipvsh)
@@ -134,10 +130,6 @@ static inline void *frag_safe_skb_hp(const struct sk_buff *skb, int offset,
 	return skb_header_pointer(skb, offset, len, buffer);
 }
 #else
-static inline struct sk_buff *skb_nfct_reasm(const struct sk_buff *skb)
-{
-	return NULL;
-}
 static inline void *frag_safe_skb_hp(const struct sk_buff *skb, int offset,
 				      int len, void *buffer,
 				      const struct ip_vs_iphdr *ipvsh)
diff --git a/net/netfilter/ipvs/ip_vs_core.c b/net/netfilter/ipvs/ip_vs_core.c
index 34fda62..085c242 100644
--- a/net/netfilter/ipvs/ip_vs_core.c
+++ b/net/netfilter/ipvs/ip_vs_core.c
@@ -43,6 +43,7 @@
 #include <net/ip6_checksum.h>
 #include <net/netns/generic.h>		/* net_generic() */
 
+#include <linux/skbuff.h>
 #include <linux/netfilter.h>
 #include <linux/netfilter_ipv4.h>
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [patch net-next 2/3] netfilter: ip6_tables: use reasm skb for matching
  2013-11-05 11:02 [patch net-next 0/3] couple of reasm fixes Jiri Pirko
  2013-11-05 11:02 ` [patch net-next 1/3] move skb_nfct_reasm into skbuff.h Jiri Pirko
@ 2013-11-05 11:02 ` Jiri Pirko
  2013-11-05 11:50   ` Marcelo Ricardo Leitner
  2013-11-05 13:32   ` Florian Westphal
  2013-11-05 11:02 ` [patch net-next 3/3] fix skb_morph to preserve skb->sk and skb->destructor pointers Jiri Pirko
  2 siblings, 2 replies; 25+ messages in thread
From: Jiri Pirko @ 2013-11-05 11:02 UTC (permalink / raw)
  To: netdev
  Cc: davem, pablo, netfilter-devel, yoshfuji, kadlec, kaber, mleitner,
	kuznet, jmorris, wensong, horms, ja, edumazet, pshelar, jasowang,
	alexander.h.duyck, coreteam, fw

Currently, when ipv6 fragment goes through the netfilter, match
functions are called on them directly. This might cause match function
to fail. So benefit from the fact that nf_defrag_ipv6 constructs
reassembled skb for us and use this reassembled skb for matching.

This patch fixes for example following situation:
On HOSTA do:
ip6tables -I INPUT -p icmpv6 -j DROP
ip6tables -I INPUT -p icmpv6 -m icmp6 --icmpv6-type 128 -j ACCEPT

and on HOSTB you do:
ping6 HOSTA -s2000    (MTU is 1500)

Incoming echo requests will be filtered out on HOSTA. This issue does
not occur with smaller packets than MTU (where fragmentation does not happen).

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
---
 net/ipv6/netfilter/ip6_tables.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/net/ipv6/netfilter/ip6_tables.c b/net/ipv6/netfilter/ip6_tables.c
index 710238f..ec9cb1a 100644
--- a/net/ipv6/netfilter/ip6_tables.c
+++ b/net/ipv6/netfilter/ip6_tables.c
@@ -328,6 +328,10 @@ ip6t_do_table(struct sk_buff *skb,
 	const struct xt_table_info *private;
 	struct xt_action_param acpar;
 	unsigned int addend;
+	struct sk_buff *reasm = skb_nfct_reasm(skb);
+
+	if (!reasm)
+		reasm = skb;
 
 	/* Initialization */
 	indev = in ? in->name : nulldevname;
@@ -368,7 +372,7 @@ ip6t_do_table(struct sk_buff *skb,
 
 		IP_NF_ASSERT(e);
 		acpar.thoff = 0;
-		if (!ip6_packet_match(skb, indev, outdev, &e->ipv6,
+		if (!ip6_packet_match(reasm, indev, outdev, &e->ipv6,
 		    &acpar.thoff, &acpar.fragoff, &acpar.hotdrop)) {
  no_match:
 			e = ip6t_next_entry(e);
@@ -378,7 +382,7 @@ ip6t_do_table(struct sk_buff *skb,
 		xt_ematch_foreach(ematch, e) {
 			acpar.match     = ematch->u.kernel.match;
 			acpar.matchinfo = ematch->data;
-			if (!acpar.match->match(skb, &acpar))
+			if (!acpar.match->match(reasm, &acpar))
 				goto no_match;
 		}
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [patch net-next 3/3] fix skb_morph to preserve skb->sk and skb->destructor pointers
  2013-11-05 11:02 [patch net-next 0/3] couple of reasm fixes Jiri Pirko
  2013-11-05 11:02 ` [patch net-next 1/3] move skb_nfct_reasm into skbuff.h Jiri Pirko
  2013-11-05 11:02 ` [patch net-next 2/3] netfilter: ip6_tables: use reasm skb for matching Jiri Pirko
@ 2013-11-05 11:02 ` Jiri Pirko
  2013-11-05 11:50   ` Marcelo Ricardo Leitner
  2013-11-05 14:06   ` Eric Dumazet
  2 siblings, 2 replies; 25+ messages in thread
From: Jiri Pirko @ 2013-11-05 11:02 UTC (permalink / raw)
  To: netdev
  Cc: davem, pablo, netfilter-devel, yoshfuji, kadlec, kaber, mleitner,
	kuznet, jmorris, wensong, horms, ja, edumazet, pshelar, jasowang,
	alexander.h.duyck, coreteam, fw

Currently __skb_clone sets skb->sk and skb->destructor to NULL. This is
not right for skb_morph use case because skb->sk may be previously
set (e. g. by xt_TPROXY).

Also, during skb_morph the destructor should not be called. It might be
previously set, e. g. by xt_TPROXY to sock_edemux, and that would cause
put sk while skb is still in flight.

This patch fixes these.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
---
 net/core/skbuff.c | 44 +++++++++++++++++++++++++++-----------------
 1 file changed, 27 insertions(+), 17 deletions(-)

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 3735fad..21b320e 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -515,7 +515,7 @@ static void skb_free_head(struct sk_buff *skb)
 		kfree(skb->head);
 }
 
-static void skb_release_data(struct sk_buff *skb)
+static void __skb_release_data(struct sk_buff *skb)
 {
 	if (!skb->cloned ||
 	    !atomic_sub_return(skb->nohdr ? (1 << SKB_DATAREF_SHIFT) + 1 : 1,
@@ -579,16 +579,12 @@ static void kfree_skbmem(struct sk_buff *skb)
 	}
 }
 
-static void skb_release_head_state(struct sk_buff *skb)
+static void __skb_release_head_state(struct sk_buff *skb)
 {
 	skb_dst_drop(skb);
 #ifdef CONFIG_XFRM
 	secpath_put(skb->sp);
 #endif
-	if (skb->destructor) {
-		WARN_ON(in_irq());
-		skb->destructor(skb);
-	}
 #if IS_ENABLED(CONFIG_NF_CONNTRACK)
 	nf_conntrack_put(skb->nfct);
 #endif
@@ -607,12 +603,19 @@ static void skb_release_head_state(struct sk_buff *skb)
 #endif
 }
 
-/* Free everything but the sk_buff shell. */
-static void skb_release_all(struct sk_buff *skb)
+static void skb_release_head_state(struct sk_buff *skb)
+{
+	if (skb->destructor) {
+		WARN_ON(in_irq());
+		skb->destructor(skb);
+	}
+	__skb_release_head_state(skb);
+}
+
+static void skb_release_data(struct sk_buff *skb)
 {
-	skb_release_head_state(skb);
 	if (likely(skb->head))
-		skb_release_data(skb);
+		__skb_release_data(skb);
 }
 
 /**
@@ -626,7 +629,8 @@ static void skb_release_all(struct sk_buff *skb)
 
 void __kfree_skb(struct sk_buff *skb)
 {
-	skb_release_all(skb);
+	skb_release_head_state(skb);
+	skb_release_data(skb);
 	kfree_skbmem(skb);
 }
 EXPORT_SYMBOL(__kfree_skb);
@@ -761,12 +765,11 @@ static void __copy_skb_header(struct sk_buff *new, const struct sk_buff *old)
  * You should not add any new code to this function.  Add it to
  * __copy_skb_header above instead.
  */
-static struct sk_buff *__skb_clone(struct sk_buff *n, struct sk_buff *skb)
+static struct sk_buff *___skb_clone(struct sk_buff *n, struct sk_buff *skb)
 {
 #define C(x) n->x = skb->x
 
 	n->next = n->prev = NULL;
-	n->sk = NULL;
 	__copy_skb_header(n, skb);
 
 	C(len);
@@ -775,7 +778,6 @@ static struct sk_buff *__skb_clone(struct sk_buff *n, struct sk_buff *skb)
 	n->hdr_len = skb->nohdr ? skb_headroom(skb) : skb->hdr_len;
 	n->cloned = 1;
 	n->nohdr = 0;
-	n->destructor = NULL;
 	C(tail);
 	C(end);
 	C(head);
@@ -791,6 +793,13 @@ static struct sk_buff *__skb_clone(struct sk_buff *n, struct sk_buff *skb)
 #undef C
 }
 
+static struct sk_buff *__skb_clone(struct sk_buff *n, struct sk_buff *skb)
+{
+	n->sk = NULL;
+	n->destructor = NULL;
+	return ___skb_clone(n, skb);
+}
+
 /**
  *	skb_morph	-	morph one skb into another
  *	@dst: the skb to receive the contents
@@ -803,8 +812,9 @@ static struct sk_buff *__skb_clone(struct sk_buff *n, struct sk_buff *skb)
  */
 struct sk_buff *skb_morph(struct sk_buff *dst, struct sk_buff *src)
 {
-	skb_release_all(dst);
-	return __skb_clone(dst, src);
+	__skb_release_head_state(dst);
+	skb_release_data(dst);
+	return ___skb_clone(dst, src);
 }
 EXPORT_SYMBOL_GPL(skb_morph);
 
@@ -1107,7 +1117,7 @@ int pskb_expand_head(struct sk_buff *skb, int nhead, int ntail,
 		if (skb_has_frag_list(skb))
 			skb_clone_fraglist(skb);
 
-		skb_release_data(skb);
+		__skb_release_data(skb);
 	} else {
 		skb_free_head(skb);
 	}
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [patch net-next 1/3] move skb_nfct_reasm into skbuff.h
  2013-11-05 11:02 ` [patch net-next 1/3] move skb_nfct_reasm into skbuff.h Jiri Pirko
@ 2013-11-05 11:50   ` Marcelo Ricardo Leitner
  2013-11-06  1:00   ` Simon Horman
  1 sibling, 0 replies; 25+ messages in thread
From: Marcelo Ricardo Leitner @ 2013-11-05 11:50 UTC (permalink / raw)
  To: Jiri Pirko, netdev
  Cc: davem, pablo, netfilter-devel, yoshfuji, kadlec, kaber, kuznet,
	jmorris, wensong, horms, ja, edumazet, pshelar, jasowang,
	alexander.h.duyck, coreteam, fw

Em 05-11-2013 09:02, Jiri Pirko escreveu:
> Signed-off-by: Jiri Pirko <jiri@resnulli.us>

Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com>

> ---
>   include/linux/skbuff.h          | 11 +++++++++++
>   include/net/ip_vs.h             |  8 --------
>   net/netfilter/ipvs/ip_vs_core.c |  1 +
>   3 files changed, 12 insertions(+), 8 deletions(-)
>
> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> index 2e153b6..ececdad 100644
> --- a/include/linux/skbuff.h
> +++ b/include/linux/skbuff.h
> @@ -2606,6 +2606,17 @@ static inline void nf_conntrack_put_reasm(struct sk_buff *skb)
>   		kfree_skb(skb);
>   }
>   #endif
> +#ifdef NET_SKBUFF_NF_DEFRAG_NEEDED
> +static inline struct sk_buff *skb_nfct_reasm(const struct sk_buff *skb)
> +{
> +	return skb->nfct_reasm;
> +}
> +#else
> +static inline struct sk_buff *skb_nfct_reasm(const struct sk_buff *skb)
> +{
> +	return NULL;
> +}
> +#endif
>   #ifdef CONFIG_BRIDGE_NETFILTER
>   static inline void nf_bridge_put(struct nf_bridge_info *nf_bridge)
>   {
> diff --git a/include/net/ip_vs.h b/include/net/ip_vs.h
> index cd7275f..6dff2b6 100644
> --- a/include/net/ip_vs.h
> +++ b/include/net/ip_vs.h
> @@ -119,10 +119,6 @@ struct ip_vs_iphdr {
>
>   /* Dependency to module: nf_defrag_ipv6 */
>   #if defined(CONFIG_NF_DEFRAG_IPV6) || defined(CONFIG_NF_DEFRAG_IPV6_MODULE)
> -static inline struct sk_buff *skb_nfct_reasm(const struct sk_buff *skb)
> -{
> -	return skb->nfct_reasm;
> -}
>   static inline void *frag_safe_skb_hp(const struct sk_buff *skb, int offset,
>   				      int len, void *buffer,
>   				      const struct ip_vs_iphdr *ipvsh)
> @@ -134,10 +130,6 @@ static inline void *frag_safe_skb_hp(const struct sk_buff *skb, int offset,
>   	return skb_header_pointer(skb, offset, len, buffer);
>   }
>   #else
> -static inline struct sk_buff *skb_nfct_reasm(const struct sk_buff *skb)
> -{
> -	return NULL;
> -}
>   static inline void *frag_safe_skb_hp(const struct sk_buff *skb, int offset,
>   				      int len, void *buffer,
>   				      const struct ip_vs_iphdr *ipvsh)
> diff --git a/net/netfilter/ipvs/ip_vs_core.c b/net/netfilter/ipvs/ip_vs_core.c
> index 34fda62..085c242 100644
> --- a/net/netfilter/ipvs/ip_vs_core.c
> +++ b/net/netfilter/ipvs/ip_vs_core.c
> @@ -43,6 +43,7 @@
>   #include <net/ip6_checksum.h>
>   #include <net/netns/generic.h>		/* net_generic() */
>
> +#include <linux/skbuff.h>
>   #include <linux/netfilter.h>
>   #include <linux/netfilter_ipv4.h>
>
>


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [patch net-next 2/3] netfilter: ip6_tables: use reasm skb for matching
  2013-11-05 11:02 ` [patch net-next 2/3] netfilter: ip6_tables: use reasm skb for matching Jiri Pirko
@ 2013-11-05 11:50   ` Marcelo Ricardo Leitner
  2013-11-05 13:32   ` Florian Westphal
  1 sibling, 0 replies; 25+ messages in thread
From: Marcelo Ricardo Leitner @ 2013-11-05 11:50 UTC (permalink / raw)
  To: Jiri Pirko, netdev
  Cc: davem, pablo, netfilter-devel, yoshfuji, kadlec, kaber, kuznet,
	jmorris, wensong, horms, ja, edumazet, pshelar, jasowang,
	alexander.h.duyck, coreteam, fw

Em 05-11-2013 09:02, Jiri Pirko escreveu:
> Currently, when ipv6 fragment goes through the netfilter, match
> functions are called on them directly. This might cause match function
> to fail. So benefit from the fact that nf_defrag_ipv6 constructs
> reassembled skb for us and use this reassembled skb for matching.
>
> This patch fixes for example following situation:
> On HOSTA do:
> ip6tables -I INPUT -p icmpv6 -j DROP
> ip6tables -I INPUT -p icmpv6 -m icmp6 --icmpv6-type 128 -j ACCEPT
>
> and on HOSTB you do:
> ping6 HOSTA -s2000    (MTU is 1500)
>
> Incoming echo requests will be filtered out on HOSTA. This issue does
> not occur with smaller packets than MTU (where fragmentation does not happen).
>
> Signed-off-by: Jiri Pirko <jiri@resnulli.us>

Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com>

> ---
>   net/ipv6/netfilter/ip6_tables.c | 8 ++++++--
>   1 file changed, 6 insertions(+), 2 deletions(-)
>
> diff --git a/net/ipv6/netfilter/ip6_tables.c b/net/ipv6/netfilter/ip6_tables.c
> index 710238f..ec9cb1a 100644
> --- a/net/ipv6/netfilter/ip6_tables.c
> +++ b/net/ipv6/netfilter/ip6_tables.c
> @@ -328,6 +328,10 @@ ip6t_do_table(struct sk_buff *skb,
>   	const struct xt_table_info *private;
>   	struct xt_action_param acpar;
>   	unsigned int addend;
> +	struct sk_buff *reasm = skb_nfct_reasm(skb);
> +
> +	if (!reasm)
> +		reasm = skb;
>
>   	/* Initialization */
>   	indev = in ? in->name : nulldevname;
> @@ -368,7 +372,7 @@ ip6t_do_table(struct sk_buff *skb,
>
>   		IP_NF_ASSERT(e);
>   		acpar.thoff = 0;
> -		if (!ip6_packet_match(skb, indev, outdev, &e->ipv6,
> +		if (!ip6_packet_match(reasm, indev, outdev, &e->ipv6,
>   		    &acpar.thoff, &acpar.fragoff, &acpar.hotdrop)) {
>    no_match:
>   			e = ip6t_next_entry(e);
> @@ -378,7 +382,7 @@ ip6t_do_table(struct sk_buff *skb,
>   		xt_ematch_foreach(ematch, e) {
>   			acpar.match     = ematch->u.kernel.match;
>   			acpar.matchinfo = ematch->data;
> -			if (!acpar.match->match(skb, &acpar))
> +			if (!acpar.match->match(reasm, &acpar))
>   				goto no_match;
>   		}
>
>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [patch net-next 3/3] fix skb_morph to preserve skb->sk and skb->destructor pointers
  2013-11-05 11:02 ` [patch net-next 3/3] fix skb_morph to preserve skb->sk and skb->destructor pointers Jiri Pirko
@ 2013-11-05 11:50   ` Marcelo Ricardo Leitner
  2013-11-05 14:06   ` Eric Dumazet
  1 sibling, 0 replies; 25+ messages in thread
From: Marcelo Ricardo Leitner @ 2013-11-05 11:50 UTC (permalink / raw)
  To: Jiri Pirko, netdev
  Cc: davem, pablo, netfilter-devel, yoshfuji, kadlec, kaber, kuznet,
	jmorris, wensong, horms, ja, edumazet, pshelar, jasowang,
	alexander.h.duyck, coreteam, fw

Em 05-11-2013 09:02, Jiri Pirko escreveu:
> Currently __skb_clone sets skb->sk and skb->destructor to NULL. This is
> not right for skb_morph use case because skb->sk may be previously
> set (e. g. by xt_TPROXY).
>
> Also, during skb_morph the destructor should not be called. It might be
> previously set, e. g. by xt_TPROXY to sock_edemux, and that would cause
> put sk while skb is still in flight.
>
> This patch fixes these.
>
> Signed-off-by: Jiri Pirko <jiri@resnulli.us>

Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com>

> ---
>   net/core/skbuff.c | 44 +++++++++++++++++++++++++++-----------------
>   1 file changed, 27 insertions(+), 17 deletions(-)
>
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index 3735fad..21b320e 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -515,7 +515,7 @@ static void skb_free_head(struct sk_buff *skb)
>   		kfree(skb->head);
>   }
>
> -static void skb_release_data(struct sk_buff *skb)
> +static void __skb_release_data(struct sk_buff *skb)
>   {
>   	if (!skb->cloned ||
>   	    !atomic_sub_return(skb->nohdr ? (1 << SKB_DATAREF_SHIFT) + 1 : 1,
> @@ -579,16 +579,12 @@ static void kfree_skbmem(struct sk_buff *skb)
>   	}
>   }
>
> -static void skb_release_head_state(struct sk_buff *skb)
> +static void __skb_release_head_state(struct sk_buff *skb)
>   {
>   	skb_dst_drop(skb);
>   #ifdef CONFIG_XFRM
>   	secpath_put(skb->sp);
>   #endif
> -	if (skb->destructor) {
> -		WARN_ON(in_irq());
> -		skb->destructor(skb);
> -	}
>   #if IS_ENABLED(CONFIG_NF_CONNTRACK)
>   	nf_conntrack_put(skb->nfct);
>   #endif
> @@ -607,12 +603,19 @@ static void skb_release_head_state(struct sk_buff *skb)
>   #endif
>   }
>
> -/* Free everything but the sk_buff shell. */
> -static void skb_release_all(struct sk_buff *skb)
> +static void skb_release_head_state(struct sk_buff *skb)
> +{
> +	if (skb->destructor) {
> +		WARN_ON(in_irq());
> +		skb->destructor(skb);
> +	}
> +	__skb_release_head_state(skb);
> +}
> +
> +static void skb_release_data(struct sk_buff *skb)
>   {
> -	skb_release_head_state(skb);
>   	if (likely(skb->head))
> -		skb_release_data(skb);
> +		__skb_release_data(skb);
>   }
>
>   /**
> @@ -626,7 +629,8 @@ static void skb_release_all(struct sk_buff *skb)
>
>   void __kfree_skb(struct sk_buff *skb)
>   {
> -	skb_release_all(skb);
> +	skb_release_head_state(skb);
> +	skb_release_data(skb);
>   	kfree_skbmem(skb);
>   }
>   EXPORT_SYMBOL(__kfree_skb);
> @@ -761,12 +765,11 @@ static void __copy_skb_header(struct sk_buff *new, const struct sk_buff *old)
>    * You should not add any new code to this function.  Add it to
>    * __copy_skb_header above instead.
>    */
> -static struct sk_buff *__skb_clone(struct sk_buff *n, struct sk_buff *skb)
> +static struct sk_buff *___skb_clone(struct sk_buff *n, struct sk_buff *skb)
>   {
>   #define C(x) n->x = skb->x
>
>   	n->next = n->prev = NULL;
> -	n->sk = NULL;
>   	__copy_skb_header(n, skb);
>
>   	C(len);
> @@ -775,7 +778,6 @@ static struct sk_buff *__skb_clone(struct sk_buff *n, struct sk_buff *skb)
>   	n->hdr_len = skb->nohdr ? skb_headroom(skb) : skb->hdr_len;
>   	n->cloned = 1;
>   	n->nohdr = 0;
> -	n->destructor = NULL;
>   	C(tail);
>   	C(end);
>   	C(head);
> @@ -791,6 +793,13 @@ static struct sk_buff *__skb_clone(struct sk_buff *n, struct sk_buff *skb)
>   #undef C
>   }
>
> +static struct sk_buff *__skb_clone(struct sk_buff *n, struct sk_buff *skb)
> +{
> +	n->sk = NULL;
> +	n->destructor = NULL;
> +	return ___skb_clone(n, skb);
> +}
> +
>   /**
>    *	skb_morph	-	morph one skb into another
>    *	@dst: the skb to receive the contents
> @@ -803,8 +812,9 @@ static struct sk_buff *__skb_clone(struct sk_buff *n, struct sk_buff *skb)
>    */
>   struct sk_buff *skb_morph(struct sk_buff *dst, struct sk_buff *src)
>   {
> -	skb_release_all(dst);
> -	return __skb_clone(dst, src);
> +	__skb_release_head_state(dst);
> +	skb_release_data(dst);
> +	return ___skb_clone(dst, src);
>   }
>   EXPORT_SYMBOL_GPL(skb_morph);
>
> @@ -1107,7 +1117,7 @@ int pskb_expand_head(struct sk_buff *skb, int nhead, int ntail,
>   		if (skb_has_frag_list(skb))
>   			skb_clone_fraglist(skb);
>
> -		skb_release_data(skb);
> +		__skb_release_data(skb);
>   	} else {
>   		skb_free_head(skb);
>   	}
>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [patch net-next 2/3] netfilter: ip6_tables: use reasm skb for matching
  2013-11-05 11:02 ` [patch net-next 2/3] netfilter: ip6_tables: use reasm skb for matching Jiri Pirko
  2013-11-05 11:50   ` Marcelo Ricardo Leitner
@ 2013-11-05 13:32   ` Florian Westphal
  2013-11-05 13:41     ` Patrick McHardy
  1 sibling, 1 reply; 25+ messages in thread
From: Florian Westphal @ 2013-11-05 13:32 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: netdev, davem, pablo, netfilter-devel, yoshfuji, kadlec, kaber,
	mleitner, kuznet, jmorris, wensong, horms, ja, edumazet, pshelar,
	jasowang, alexander.h.duyck, coreteam, fw

Jiri Pirko <jiri@resnulli.us> wrote:
> This patch fixes for example following situation:
> On HOSTA do:
> ip6tables -I INPUT -p icmpv6 -j DROP
> ip6tables -I INPUT -p icmpv6 -m icmp6 --icmpv6-type 128 -j ACCEPT

untested:

-A INPUT -p icmpv6 -m icmp6 --icmpv6-type 128 -j ACCEPT
-A INPUT -p icmpv6 -m conntrack --ctstatus CONFIRMED -j ACCEPT
-A INPUT -p icmpv6 -j DROP

> and on HOSTB you do:
> ping6 HOSTA -s2000    (MTU is 1500)
> 
> Incoming echo requests will be filtered out on HOSTA. This issue does
> not occur with smaller packets than MTU (where fragmentation does not happen).

Patrick, any reason not to kill the special-casing (ct has assigned helper or
unconfirmed conntrack) in __ipv6_conntrack_in() ?

This should make ipv6 frag behaviour consistent; right now its rather
confusing from ruleset point of view, especially the first packet
of a connection is always seen as reassembled.

So with Jiris rules

-A INPUT -p icmpv6 -m icmp6 --icmpv6-type 128 -j ACCEPT
-A INPUT -p icmpv6 -j DROP

ping6 -s $bignum works for the first packet but not for subsequent ones
which is quite irritating.

This change would obviously have userspace visibility (e.g. -m frag
won't work anymore when conntrack is on), but so far I couldn't come
up with a scenario where a legitimate ruleset could break.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [patch net-next 2/3] netfilter: ip6_tables: use reasm skb for matching
  2013-11-05 13:32   ` Florian Westphal
@ 2013-11-05 13:41     ` Patrick McHardy
  2013-11-05 15:01       ` Jiri Pirko
  0 siblings, 1 reply; 25+ messages in thread
From: Patrick McHardy @ 2013-11-05 13:41 UTC (permalink / raw)
  To: Florian Westphal
  Cc: Jiri Pirko, netdev, davem, pablo, netfilter-devel, yoshfuji,
	kadlec, mleitner, kuznet, jmorris, wensong, horms, ja, edumazet,
	pshelar, jasowang, alexander.h.duyck, coreteam

On Tue, Nov 05, 2013 at 02:32:05PM +0100, Florian Westphal wrote:
> Jiri Pirko <jiri@resnulli.us> wrote:
> > This patch fixes for example following situation:
> > On HOSTA do:
> > ip6tables -I INPUT -p icmpv6 -j DROP
> > ip6tables -I INPUT -p icmpv6 -m icmp6 --icmpv6-type 128 -j ACCEPT
> 
> untested:
> 
> -A INPUT -p icmpv6 -m icmp6 --icmpv6-type 128 -j ACCEPT
> -A INPUT -p icmpv6 -m conntrack --ctstatus CONFIRMED -j ACCEPT
> -A INPUT -p icmpv6 -j DROP
> 
> > and on HOSTB you do:
> > ping6 HOSTA -s2000    (MTU is 1500)
> > 
> > Incoming echo requests will be filtered out on HOSTA. This issue does
> > not occur with smaller packets than MTU (where fragmentation does not happen).
> 
> Patrick, any reason not to kill the special-casing (ct has assigned helper or
> unconfirmed conntrack) in __ipv6_conntrack_in() ?
> 
> This should make ipv6 frag behaviour consistent; right now its rather
> confusing from ruleset point of view, especially the first packet
> of a connection is always seen as reassembled.
> 
> So with Jiris rules
> 
> -A INPUT -p icmpv6 -m icmp6 --icmpv6-type 128 -j ACCEPT
> -A INPUT -p icmpv6 -j DROP
> 
> ping6 -s $bignum works for the first packet but not for subsequent ones
> which is quite irritating.

Well, the reason was to avoid unnecessary work doing refragmentation
unless really required. I know its rather complicated, but IPv6 has
always required treating fragments manually or using conntrack state.

I'm not objecting to changing this, but the patches as they are are
not the way to go. First, moving nfct_frag to struct sk_buff seems
like a real waste of space for this quite rare case. Also, we can't
just use the reassembled packet in ip6tables, when modifying it we
will still output the unchanged fragments. An last of all, we'll be
executing the rules on the reassembled packet multiple times, one
for each fragment.

So if someone wants to change this, simply *only* pass the reassembled
packet through the netfilter hooks and drop the fragments, as in IPv4.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [patch net-next 3/3] fix skb_morph to preserve skb->sk and skb->destructor pointers
  2013-11-05 11:02 ` [patch net-next 3/3] fix skb_morph to preserve skb->sk and skb->destructor pointers Jiri Pirko
  2013-11-05 11:50   ` Marcelo Ricardo Leitner
@ 2013-11-05 14:06   ` Eric Dumazet
  2013-11-05 14:47     ` Jiri Pirko
  1 sibling, 1 reply; 25+ messages in thread
From: Eric Dumazet @ 2013-11-05 14:06 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: netdev, davem, pablo, netfilter-devel, yoshfuji, kadlec, kaber,
	mleitner, kuznet, jmorris, wensong, horms, ja, edumazet, pshelar,
	jasowang, alexander.h.duyck, coreteam, fw

On Tue, 2013-11-05 at 12:02 +0100, Jiri Pirko wrote:
> Currently __skb_clone sets skb->sk and skb->destructor to NULL. This is
> not right for skb_morph use case because skb->sk may be previously
> set (e. g. by xt_TPROXY).
> 
> Also, during skb_morph the destructor should not be called. It might be
> previously set, e. g. by xt_TPROXY to sock_edemux, and that would cause
> put sk while skb is still in flight.

truesize alert. You should add some documentation that skb_morph()
must not be used in transmit path.

Its not clear to be how this can happen.

skb_morph() being used only from ipv4 defrag (or ipv6 reassembly).

Maybe the problem could be fixed by doing this defrag _before_ setting
skb->sk ?

Also, I would prefer you find a way to put all this logic inside
skb_morph() instead of adding such complexity in this already complex
code, I fear the compiler will generate slower code with your patch on
fast path.

Maybe something as simple as following (untested) patch ?

Note that the truesize concern might need some care.

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 3735fad5616e..afabfd6ef341 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -793,18 +793,28 @@ static struct sk_buff *__skb_clone(struct sk_buff *n, struct sk_buff *skb)
 
 /**
  *	skb_morph	-	morph one skb into another
- *	@dst: the skb to receive the contents
+ *	@skb: the skb to receive the contents
  *	@src: the skb to supply the contents
  *
  *	This is identical to skb_clone except that the target skb is
- *	supplied by the user.
+ *	supplied by the user, and that we keep target skb destructor in place,
+ *	meaning this can not be used in transmit path, as skb->truesize might
+ *	change.
  *
  *	The target skb is returned upon exit.
  */
-struct sk_buff *skb_morph(struct sk_buff *dst, struct sk_buff *src)
+struct sk_buff *skb_morph(struct sk_buff *skb, struct sk_buff *src)
 {
-	skb_release_all(dst);
-	return __skb_clone(dst, src);
+	struct sock *save_sk = skb->sk;
+	void (*save_destructor)(struct sk_buff *) = skb->destructor;
+
+	skb->sk = NULL;
+	skb->destructor = NULL;
+	skb_release_all(skb);
+	__skb_clone(skb, src);
+	skb->sk = save_sk;
+	skb->destructor = save_destructor;
+	return skb;
 }
 EXPORT_SYMBOL_GPL(skb_morph);
 



^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [patch net-next 3/3] fix skb_morph to preserve skb->sk and skb->destructor pointers
  2013-11-05 14:06   ` Eric Dumazet
@ 2013-11-05 14:47     ` Jiri Pirko
  0 siblings, 0 replies; 25+ messages in thread
From: Jiri Pirko @ 2013-11-05 14:47 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: netdev, davem, pablo, netfilter-devel, yoshfuji, kadlec, kaber,
	mleitner, kuznet, jmorris, wensong, horms, ja, edumazet, pshelar,
	jasowang, alexander.h.duyck, coreteam, fw

Tue, Nov 05, 2013 at 03:06:12PM CET, eric.dumazet@gmail.com wrote:
>On Tue, 2013-11-05 at 12:02 +0100, Jiri Pirko wrote:
>> Currently __skb_clone sets skb->sk and skb->destructor to NULL. This is
>> not right for skb_morph use case because skb->sk may be previously
>> set (e. g. by xt_TPROXY).
>> 
>> Also, during skb_morph the destructor should not be called. It might be
>> previously set, e. g. by xt_TPROXY to sock_edemux, and that would cause
>> put sk while skb is still in flight.
>
>truesize alert. You should add some documentation that skb_morph()
>must not be used in transmit path.
>
>Its not clear to be how this can happen.
>
>skb_morph() being used only from ipv4 defrag (or ipv6 reassembly).

nod

>
>Maybe the problem could be fixed by doing this defrag _before_ setting
>skb->sk ?

skb->sk is set for exmaple in xt_TPROXY that is before reassemly is
done, because reassembly is done after all the netfilter code pass the
skb through. I do not see how this can be changed in the way you are
suggesting.


>
>Also, I would prefer you find a way to put all this logic inside
>skb_morph() instead of adding such complexity in this already complex
>code, I fear the compiler will generate slower code with your patch on
>fast path.
>
>Maybe something as simple as following (untested) patch ?

I had very similar patch prepared. But I did not like it so I chose the
other way. I'm more or less okay with doing this this way though...

>
>Note that the truesize concern might need some care.
>
>diff --git a/net/core/skbuff.c b/net/core/skbuff.c
>index 3735fad5616e..afabfd6ef341 100644
>--- a/net/core/skbuff.c
>+++ b/net/core/skbuff.c
>@@ -793,18 +793,28 @@ static struct sk_buff *__skb_clone(struct sk_buff *n, struct sk_buff *skb)
> 
> /**
>  *	skb_morph	-	morph one skb into another
>- *	@dst: the skb to receive the contents
>+ *	@skb: the skb to receive the contents
>  *	@src: the skb to supply the contents
>  *
>  *	This is identical to skb_clone except that the target skb is
>- *	supplied by the user.
>+ *	supplied by the user, and that we keep target skb destructor in place,
>+ *	meaning this can not be used in transmit path, as skb->truesize might
>+ *	change.
>  *
>  *	The target skb is returned upon exit.
>  */
>-struct sk_buff *skb_morph(struct sk_buff *dst, struct sk_buff *src)
>+struct sk_buff *skb_morph(struct sk_buff *skb, struct sk_buff *src)
> {
>-	skb_release_all(dst);
>-	return __skb_clone(dst, src);
>+	struct sock *save_sk = skb->sk;
>+	void (*save_destructor)(struct sk_buff *) = skb->destructor;
>+
>+	skb->sk = NULL;
>+	skb->destructor = NULL;
>+	skb_release_all(skb);
>+	__skb_clone(skb, src);
>+	skb->sk = save_sk;
>+	skb->destructor = save_destructor;
>+	return skb;
> }
> EXPORT_SYMBOL_GPL(skb_morph);
> 
>
>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [patch net-next 2/3] netfilter: ip6_tables: use reasm skb for matching
  2013-11-05 13:41     ` Patrick McHardy
@ 2013-11-05 15:01       ` Jiri Pirko
  2013-11-05 15:39         ` Florian Westphal
  2013-11-05 18:16         ` Patrick McHardy
  0 siblings, 2 replies; 25+ messages in thread
From: Jiri Pirko @ 2013-11-05 15:01 UTC (permalink / raw)
  To: Patrick McHardy
  Cc: Florian Westphal, netdev, davem, pablo, netfilter-devel, yoshfuji,
	kadlec, mleitner, kuznet, jmorris, wensong, horms, ja, edumazet,
	pshelar, jasowang, alexander.h.duyck, coreteam

Tue, Nov 05, 2013 at 02:41:19PM CET, kaber@trash.net wrote:
>On Tue, Nov 05, 2013 at 02:32:05PM +0100, Florian Westphal wrote:
>> Jiri Pirko <jiri@resnulli.us> wrote:
>> > This patch fixes for example following situation:
>> > On HOSTA do:
>> > ip6tables -I INPUT -p icmpv6 -j DROP
>> > ip6tables -I INPUT -p icmpv6 -m icmp6 --icmpv6-type 128 -j ACCEPT
>> 
>> untested:
>> 
>> -A INPUT -p icmpv6 -m icmp6 --icmpv6-type 128 -j ACCEPT
>> -A INPUT -p icmpv6 -m conntrack --ctstatus CONFIRMED -j ACCEPT
>> -A INPUT -p icmpv6 -j DROP
>> 
>> > and on HOSTB you do:
>> > ping6 HOSTA -s2000    (MTU is 1500)
>> > 
>> > Incoming echo requests will be filtered out on HOSTA. This issue does
>> > not occur with smaller packets than MTU (where fragmentation does not happen).
>> 
>> Patrick, any reason not to kill the special-casing (ct has assigned helper or
>> unconfirmed conntrack) in __ipv6_conntrack_in() ?
>> 
>> This should make ipv6 frag behaviour consistent; right now its rather
>> confusing from ruleset point of view, especially the first packet
>> of a connection is always seen as reassembled.
>> 
>> So with Jiris rules
>> 
>> -A INPUT -p icmpv6 -m icmp6 --icmpv6-type 128 -j ACCEPT
>> -A INPUT -p icmpv6 -j DROP
>> 
>> ping6 -s $bignum works for the first packet but not for subsequent ones
>> which is quite irritating.
>
>Well, the reason was to avoid unnecessary work doing refragmentation
>unless really required. I know its rather complicated, but IPv6 has
>always required treating fragments manually or using conntrack state.
>
>I'm not objecting to changing this, but the patches as they are are
>not the way to go. First, moving nfct_frag to struct sk_buff seems

I'm a bit lost. What "nfct_frag" are you reffering to here?

>like a real waste of space for this quite rare case. Also, we can't
>just use the reassembled packet in ip6tables, when modifying it we
>will still output the unchanged fragments. An last of all, we'll be
>executing the rules on the reassembled packet multiple times, one
>for each fragment.

Reassembled skb would be only used for matching where no changes takes
place.

End even though, the matching is now done for each fragment skb anyway. The
change is only to do it on different skb. I see no erformance or any
other problem in that.

>
>So if someone wants to change this, simply *only* pass the reassembled
>packet through the netfilter hooks and drop the fragments, as in IPv4.

This is unfortunatelly not possible because in forwarding use case, the
fragments have to be send out as they come in.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [patch net-next 2/3] netfilter: ip6_tables: use reasm skb for matching
  2013-11-05 15:01       ` Jiri Pirko
@ 2013-11-05 15:39         ` Florian Westphal
  2013-11-05 18:19           ` Patrick McHardy
  2013-11-05 18:16         ` Patrick McHardy
  1 sibling, 1 reply; 25+ messages in thread
From: Florian Westphal @ 2013-11-05 15:39 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: Patrick McHardy, Florian Westphal, netdev, davem, pablo,
	netfilter-devel, yoshfuji, kadlec, mleitner, kuznet, jmorris,
	wensong, horms, ja, edumazet, pshelar, jasowang,
	alexander.h.duyck, coreteam

Jiri Pirko <jiri@resnulli.us> wrote:
> Tue, Nov 05, 2013 at 02:41:19PM CET, kaber@trash.net wrote:
> >executing the rules on the reassembled packet multiple times, one
> >for each fragment.
> 
[..]
> End even though, the matching is now done for each fragment skb anyway. The
> change is only to do it on different skb. I see no erformance or any
> other problem in that.

One problem that comes to mind is that nfacct or quota match will
now account num_of_fragments * length_of_reassemled_skb bytes.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [patch net-next 2/3] netfilter: ip6_tables: use reasm skb for matching
  2013-11-05 15:01       ` Jiri Pirko
  2013-11-05 15:39         ` Florian Westphal
@ 2013-11-05 18:16         ` Patrick McHardy
  2013-11-05 20:55           ` Jiri Pirko
  2013-11-06 14:18           ` Jiri Pirko
  1 sibling, 2 replies; 25+ messages in thread
From: Patrick McHardy @ 2013-11-05 18:16 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: Florian Westphal, netdev, davem, pablo, netfilter-devel, yoshfuji,
	kadlec, mleitner, kuznet, jmorris, wensong, horms, ja, edumazet,
	pshelar, jasowang, alexander.h.duyck, coreteam

On Tue, Nov 05, 2013 at 04:01:15PM +0100, Jiri Pirko wrote:
> Tue, Nov 05, 2013 at 02:41:19PM CET, kaber@trash.net wrote:
> >On Tue, Nov 05, 2013 at 02:32:05PM +0100, Florian Westphal wrote:
> >> Jiri Pirko <jiri@resnulli.us> wrote:
> >> > This patch fixes for example following situation:
> >> > On HOSTA do:
> >> > ip6tables -I INPUT -p icmpv6 -j DROP
> >> > ip6tables -I INPUT -p icmpv6 -m icmp6 --icmpv6-type 128 -j ACCEPT
> >> 
> >> untested:
> >> 
> >> -A INPUT -p icmpv6 -m icmp6 --icmpv6-type 128 -j ACCEPT
> >> -A INPUT -p icmpv6 -m conntrack --ctstatus CONFIRMED -j ACCEPT
> >> -A INPUT -p icmpv6 -j DROP
> >> 
> >> > and on HOSTB you do:
> >> > ping6 HOSTA -s2000    (MTU is 1500)
> >> > 
> >> > Incoming echo requests will be filtered out on HOSTA. This issue does
> >> > not occur with smaller packets than MTU (where fragmentation does not happen).
> >> 
> >> Patrick, any reason not to kill the special-casing (ct has assigned helper or
> >> unconfirmed conntrack) in __ipv6_conntrack_in() ?
> >> 
> >> This should make ipv6 frag behaviour consistent; right now its rather
> >> confusing from ruleset point of view, especially the first packet
> >> of a connection is always seen as reassembled.
> >> 
> >> So with Jiris rules
> >> 
> >> -A INPUT -p icmpv6 -m icmp6 --icmpv6-type 128 -j ACCEPT
> >> -A INPUT -p icmpv6 -j DROP
> >> 
> >> ping6 -s $bignum works for the first packet but not for subsequent ones
> >> which is quite irritating.
> >
> >Well, the reason was to avoid unnecessary work doing refragmentation
> >unless really required. I know its rather complicated, but IPv6 has
> >always required treating fragments manually or using conntrack state.
> >
> >I'm not objecting to changing this, but the patches as they are are
> >not the way to go. First, moving nfct_frag to struct sk_buff seems
> 
> I'm a bit lost. What "nfct_frag" are you reffering to here?

I meant nfct_reasm of course.

> >like a real waste of space for this quite rare case. Also, we can't
> >just use the reassembled packet in ip6tables, when modifying it we
> >will still output the unchanged fragments. An last of all, we'll be
> >executing the rules on the reassembled packet multiple times, one
> >for each fragment.
> 
> Reassembled skb would be only used for matching where no changes takes
> place.

That still doesn't work, our matches are not purely passive.

> End even though, the matching is now done for each fragment skb anyway. The
> change is only to do it on different skb. I see no erformance or any
> other problem in that.

Accounting, quota, statistic, limit, ... come to mind. Basically any
match that keeps state.

> >So if someone wants to change this, simply *only* pass the reassembled
> >packet through the netfilter hooks and drop the fragments, as in IPv4.
> 
> This is unfortunatelly not possible because in forwarding use case, the
> fragments have to be send out as they come in.

No, the IPv6 NAT patches fixed that, we still do proper refragmentation
and we still respect the original fragment sizes, thus are not responsible
for potentially exceeding the PMTU on the following path.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [patch net-next 2/3] netfilter: ip6_tables: use reasm skb for matching
  2013-11-05 15:39         ` Florian Westphal
@ 2013-11-05 18:19           ` Patrick McHardy
  2013-11-05 18:21             ` Jiri Pirko
  0 siblings, 1 reply; 25+ messages in thread
From: Patrick McHardy @ 2013-11-05 18:19 UTC (permalink / raw)
  To: Florian Westphal
  Cc: Jiri Pirko, netdev, davem, pablo, netfilter-devel, yoshfuji,
	kadlec, mleitner, kuznet, jmorris, wensong, horms, ja, edumazet,
	pshelar, jasowang, alexander.h.duyck, coreteam

On Tue, Nov 05, 2013 at 04:39:21PM +0100, Florian Westphal wrote:
> Jiri Pirko <jiri@resnulli.us> wrote:
> > Tue, Nov 05, 2013 at 02:41:19PM CET, kaber@trash.net wrote:
> > >executing the rules on the reassembled packet multiple times, one
> > >for each fragment.
> > 
> [..]
> > End even though, the matching is now done for each fragment skb anyway. The
> > change is only to do it on different skb. I see no erformance or any
> > other problem in that.
> 
> One problem that comes to mind is that nfacct or quota match will
> now account num_of_fragments * length_of_reassemled_skb bytes.

indeed. The easiest way to fix all this (and, btw, also the
pskb_expand_head() oops which is currently reported by multiple people)
is to get rid of all the fragmentation handling and simply use the
reassembled skb.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [patch net-next 2/3] netfilter: ip6_tables: use reasm skb for matching
  2013-11-05 18:19           ` Patrick McHardy
@ 2013-11-05 18:21             ` Jiri Pirko
  0 siblings, 0 replies; 25+ messages in thread
From: Jiri Pirko @ 2013-11-05 18:21 UTC (permalink / raw)
  To: Patrick McHardy
  Cc: Florian Westphal, netdev, davem, pablo, netfilter-devel, yoshfuji,
	kadlec, mleitner, kuznet, jmorris, wensong, horms, ja, edumazet,
	pshelar, jasowang, alexander.h.duyck, coreteam

Tue, Nov 05, 2013 at 07:19:24PM CET, kaber@trash.net wrote:
>On Tue, Nov 05, 2013 at 04:39:21PM +0100, Florian Westphal wrote:
>> Jiri Pirko <jiri@resnulli.us> wrote:
>> > Tue, Nov 05, 2013 at 02:41:19PM CET, kaber@trash.net wrote:
>> > >executing the rules on the reassembled packet multiple times, one
>> > >for each fragment.
>> > 
>> [..]
>> > End even though, the matching is now done for each fragment skb anyway. The
>> > change is only to do it on different skb. I see no erformance or any
>> > other problem in that.
>> 
>> One problem that comes to mind is that nfacct or quota match will
>> now account num_of_fragments * length_of_reassemled_skb bytes.
>
>indeed. The easiest way to fix all this (and, btw, also the
>pskb_expand_head() oops which is currently reported by multiple people)
>is to get rid of all the fragmentation handling and simply use the
>reassembled skb.

Okay. That will resolve the skb->sk rewrite problem as well. I will
prepare a patch. 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [patch net-next 2/3] netfilter: ip6_tables: use reasm skb for matching
  2013-11-05 18:16         ` Patrick McHardy
@ 2013-11-05 20:55           ` Jiri Pirko
  2013-11-05 22:02             ` Patrick McHardy
  2013-11-06 14:18           ` Jiri Pirko
  1 sibling, 1 reply; 25+ messages in thread
From: Jiri Pirko @ 2013-11-05 20:55 UTC (permalink / raw)
  To: Patrick McHardy
  Cc: Florian Westphal, netdev, davem, pablo, netfilter-devel, yoshfuji,
	kadlec, mleitner, kuznet, jmorris, wensong, horms, ja, edumazet,
	pshelar, jasowang, alexander.h.duyck, coreteam

Tue, Nov 05, 2013 at 07:16:33PM CET, kaber@trash.net wrote:
>On Tue, Nov 05, 2013 at 04:01:15PM +0100, Jiri Pirko wrote:
>> Tue, Nov 05, 2013 at 02:41:19PM CET, kaber@trash.net wrote:
>> >On Tue, Nov 05, 2013 at 02:32:05PM +0100, Florian Westphal wrote:
>> >> Jiri Pirko <jiri@resnulli.us> wrote:
>> >> > This patch fixes for example following situation:
>> >> > On HOSTA do:
>> >> > ip6tables -I INPUT -p icmpv6 -j DROP
>> >> > ip6tables -I INPUT -p icmpv6 -m icmp6 --icmpv6-type 128 -j ACCEPT
>> >> 
>> >> untested:
>> >> 
>> >> -A INPUT -p icmpv6 -m icmp6 --icmpv6-type 128 -j ACCEPT
>> >> -A INPUT -p icmpv6 -m conntrack --ctstatus CONFIRMED -j ACCEPT
>> >> -A INPUT -p icmpv6 -j DROP
>> >> 
>> >> > and on HOSTB you do:
>> >> > ping6 HOSTA -s2000    (MTU is 1500)
>> >> > 
>> >> > Incoming echo requests will be filtered out on HOSTA. This issue does
>> >> > not occur with smaller packets than MTU (where fragmentation does not happen).
>> >> 
>> >> Patrick, any reason not to kill the special-casing (ct has assigned helper or
>> >> unconfirmed conntrack) in __ipv6_conntrack_in() ?
>> >> 
>> >> This should make ipv6 frag behaviour consistent; right now its rather
>> >> confusing from ruleset point of view, especially the first packet
>> >> of a connection is always seen as reassembled.
>> >> 
>> >> So with Jiris rules
>> >> 
>> >> -A INPUT -p icmpv6 -m icmp6 --icmpv6-type 128 -j ACCEPT
>> >> -A INPUT -p icmpv6 -j DROP
>> >> 
>> >> ping6 -s $bignum works for the first packet but not for subsequent ones
>> >> which is quite irritating.
>> >
>> >Well, the reason was to avoid unnecessary work doing refragmentation
>> >unless really required. I know its rather complicated, but IPv6 has
>> >always required treating fragments manually or using conntrack state.
>> >
>> >I'm not objecting to changing this, but the patches as they are are
>> >not the way to go. First, moving nfct_frag to struct sk_buff seems
>> 
>> I'm a bit lost. What "nfct_frag" are you reffering to here?
>
>I meant nfct_reasm of course.

The patch is not moving this to struct sk_buff. It is already there.

>
>> >like a real waste of space for this quite rare case. Also, we can't
>> >just use the reassembled packet in ip6tables, when modifying it we
>> >will still output the unchanged fragments. An last of all, we'll be
>> >executing the rules on the reassembled packet multiple times, one
>> >for each fragment.
>> 
>> Reassembled skb would be only used for matching where no changes takes
>> place.
>
>That still doesn't work, our matches are not purely passive.
>
>> End even though, the matching is now done for each fragment skb anyway. The
>> change is only to do it on different skb. I see no erformance or any
>> other problem in that.
>
>Accounting, quota, statistic, limit, ... come to mind. Basically any
>match that keeps state.

Ok. Makes sense.

>
>> >So if someone wants to change this, simply *only* pass the reassembled
>> >packet through the netfilter hooks and drop the fragments, as in IPv4.
>> 
>> This is unfortunatelly not possible because in forwarding use case, the
>> fragments have to be send out as they come in.
>
>No, the IPv6 NAT patches fixed that, we still do proper refragmentation
>and we still respect the original fragment sizes, thus are not responsible
>for potentially exceeding the PMTU on the following path.

Ok. So the plan is to remove net/ipv6/netfilter/nf_conntrack_reasm.c
code entirely and use net/ipv6/reassembly.c code directly from
nf_defrag_ipv6. This would result in very similar code currently ipv4
has. 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [patch net-next 2/3] netfilter: ip6_tables: use reasm skb for matching
  2013-11-05 20:55           ` Jiri Pirko
@ 2013-11-05 22:02             ` Patrick McHardy
  0 siblings, 0 replies; 25+ messages in thread
From: Patrick McHardy @ 2013-11-05 22:02 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: Florian Westphal, netdev, davem, pablo, netfilter-devel, yoshfuji,
	kadlec, mleitner, kuznet, jmorris, wensong, horms, ja, edumazet,
	pshelar, jasowang, alexander.h.duyck, coreteam

On Tue, Nov 05, 2013 at 09:55:20PM +0100, Jiri Pirko wrote:
> Tue, Nov 05, 2013 at 07:16:33PM CET, kaber@trash.net wrote:
>
> >> I'm a bit lost. What "nfct_frag" are you reffering to here?
> >
> >I meant nfct_reasm of course.
> 
> The patch is not moving this to struct sk_buff. It is already there.

Right again, sorry, I was replying to Florian's mail without the
quoted patch, so I seem to have mixed up things in between :)

> >> >So if someone wants to change this, simply *only* pass the reassembled
> >> >packet through the netfilter hooks and drop the fragments, as in IPv4.
> >> 
> >> This is unfortunatelly not possible because in forwarding use case, the
> >> fragments have to be send out as they come in.
> >
> >No, the IPv6 NAT patches fixed that, we still do proper refragmentation
> >and we still respect the original fragment sizes, thus are not responsible
> >for potentially exceeding the PMTU on the following path.
> 
> Ok. So the plan is to remove net/ipv6/netfilter/nf_conntrack_reasm.c
> code entirely and use net/ipv6/reassembly.c code directly from
> nf_defrag_ipv6. This would result in very similar code currently ipv4
> has. 

If its possible to use net/ipv6/reassembly.c directly, even better,
so far I was thinking of just getting rid of the fragment replay,
but you're most likely right that this is the proper way to do it.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [patch net-next 1/3] move skb_nfct_reasm into skbuff.h
  2013-11-05 11:02 ` [patch net-next 1/3] move skb_nfct_reasm into skbuff.h Jiri Pirko
  2013-11-05 11:50   ` Marcelo Ricardo Leitner
@ 2013-11-06  1:00   ` Simon Horman
  1 sibling, 0 replies; 25+ messages in thread
From: Simon Horman @ 2013-11-06  1:00 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: netdev, davem, pablo, netfilter-devel, yoshfuji, kadlec, kaber,
	mleitner, kuznet, jmorris, wensong, ja, edumazet, pshelar,
	jasowang, alexander.h.duyck, coreteam, fw

On Tue, Nov 05, 2013 at 12:02:11PM +0100, Jiri Pirko wrote:
> Signed-off-by: Jiri Pirko <jiri@resnulli.us>
> ---
>  include/linux/skbuff.h          | 11 +++++++++++
>  include/net/ip_vs.h             |  8 --------
>  net/netfilter/ipvs/ip_vs_core.c |  1 +
>  3 files changed, 12 insertions(+), 8 deletions(-)

IPVS portion:

Acked-by: Simon Horman <horms@verge.net.au>


> 
> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> index 2e153b6..ececdad 100644
> --- a/include/linux/skbuff.h
> +++ b/include/linux/skbuff.h
> @@ -2606,6 +2606,17 @@ static inline void nf_conntrack_put_reasm(struct sk_buff *skb)
>  		kfree_skb(skb);
>  }
>  #endif
> +#ifdef NET_SKBUFF_NF_DEFRAG_NEEDED
> +static inline struct sk_buff *skb_nfct_reasm(const struct sk_buff *skb)
> +{
> +	return skb->nfct_reasm;
> +}
> +#else
> +static inline struct sk_buff *skb_nfct_reasm(const struct sk_buff *skb)
> +{
> +	return NULL;
> +}
> +#endif
>  #ifdef CONFIG_BRIDGE_NETFILTER
>  static inline void nf_bridge_put(struct nf_bridge_info *nf_bridge)
>  {
> diff --git a/include/net/ip_vs.h b/include/net/ip_vs.h
> index cd7275f..6dff2b6 100644
> --- a/include/net/ip_vs.h
> +++ b/include/net/ip_vs.h
> @@ -119,10 +119,6 @@ struct ip_vs_iphdr {
>  
>  /* Dependency to module: nf_defrag_ipv6 */
>  #if defined(CONFIG_NF_DEFRAG_IPV6) || defined(CONFIG_NF_DEFRAG_IPV6_MODULE)
> -static inline struct sk_buff *skb_nfct_reasm(const struct sk_buff *skb)
> -{
> -	return skb->nfct_reasm;
> -}
>  static inline void *frag_safe_skb_hp(const struct sk_buff *skb, int offset,
>  				      int len, void *buffer,
>  				      const struct ip_vs_iphdr *ipvsh)
> @@ -134,10 +130,6 @@ static inline void *frag_safe_skb_hp(const struct sk_buff *skb, int offset,
>  	return skb_header_pointer(skb, offset, len, buffer);
>  }
>  #else
> -static inline struct sk_buff *skb_nfct_reasm(const struct sk_buff *skb)
> -{
> -	return NULL;
> -}
>  static inline void *frag_safe_skb_hp(const struct sk_buff *skb, int offset,
>  				      int len, void *buffer,
>  				      const struct ip_vs_iphdr *ipvsh)
> diff --git a/net/netfilter/ipvs/ip_vs_core.c b/net/netfilter/ipvs/ip_vs_core.c
> index 34fda62..085c242 100644
> --- a/net/netfilter/ipvs/ip_vs_core.c
> +++ b/net/netfilter/ipvs/ip_vs_core.c
> @@ -43,6 +43,7 @@
>  #include <net/ip6_checksum.h>
>  #include <net/netns/generic.h>		/* net_generic() */
>  
> +#include <linux/skbuff.h>
>  #include <linux/netfilter.h>
>  #include <linux/netfilter_ipv4.h>
>  
> -- 
> 1.8.3.1
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [patch net-next 2/3] netfilter: ip6_tables: use reasm skb for matching
  2013-11-05 18:16         ` Patrick McHardy
  2013-11-05 20:55           ` Jiri Pirko
@ 2013-11-06 14:18           ` Jiri Pirko
  2013-11-06 14:33             ` Florian Westphal
  1 sibling, 1 reply; 25+ messages in thread
From: Jiri Pirko @ 2013-11-06 14:18 UTC (permalink / raw)
  To: Patrick McHardy
  Cc: Florian Westphal, netdev, davem, pablo, netfilter-devel, yoshfuji,
	kadlec, mleitner, kuznet, jmorris, wensong, horms, ja, edumazet,
	pshelar, jasowang, alexander.h.duyck, coreteam

>> >So if someone wants to change this, simply *only* pass the reassembled
>> >packet through the netfilter hooks and drop the fragments, as in IPv4.
>> 
>> This is unfortunatelly not possible because in forwarding use case, the
>> fragments have to be send out as they come in.
>
>No, the IPv6 NAT patches fixed that, we still do proper refragmentation
>and we still respect the original fragment sizes, thus are not responsible
>for potentially exceeding the PMTU on the following path.

Can you please point where this is done. Where the original fragment
sizes are stored and in which code are they restored? Thanks.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [patch net-next 2/3] netfilter: ip6_tables: use reasm skb for matching
  2013-11-06 14:18           ` Jiri Pirko
@ 2013-11-06 14:33             ` Florian Westphal
  2013-11-06 14:44               ` Jiri Pirko
  0 siblings, 1 reply; 25+ messages in thread
From: Florian Westphal @ 2013-11-06 14:33 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: Patrick McHardy, Florian Westphal, netdev, davem, pablo,
	netfilter-devel, yoshfuji, kadlec, mleitner, kuznet, jmorris,
	wensong, horms, ja, edumazet, pshelar, jasowang,
	alexander.h.duyck, coreteam

Jiri Pirko <jiri@resnulli.us> wrote:
> >> >So if someone wants to change this, simply *only* pass the reassembled
> >> >packet through the netfilter hooks and drop the fragments, as in IPv4.
> >> 
> >> This is unfortunatelly not possible because in forwarding use case, the
> >> fragments have to be send out as they come in.
> >
> >No, the IPv6 NAT patches fixed that, we still do proper refragmentation
> >and we still respect the original fragment sizes, thus are not responsible
> >for potentially exceeding the PMTU on the following path.
> 
> Can you please point where this is done. Where the original fragment
> sizes are stored and in which code are they restored? Thanks.

Patrick is probably talking about

commit 4cdd34084d539c758d00c5dc7bf95db2e4f2bc70
(netfilter: nf_conntrack_ipv6: improve fragmentation handling)
which introduces 'frag_max_size' in inet6_skb_parm struct.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [patch net-next 2/3] netfilter: ip6_tables: use reasm skb for matching
  2013-11-06 14:33             ` Florian Westphal
@ 2013-11-06 14:44               ` Jiri Pirko
  2013-11-06 14:51                 ` Patrick McHardy
  0 siblings, 1 reply; 25+ messages in thread
From: Jiri Pirko @ 2013-11-06 14:44 UTC (permalink / raw)
  To: Florian Westphal
  Cc: Patrick McHardy, netdev, davem, pablo, netfilter-devel, yoshfuji,
	kadlec, mleitner, kuznet, jmorris, wensong, horms, ja, edumazet,
	pshelar, jasowang, alexander.h.duyck, coreteam

Wed, Nov 06, 2013 at 03:33:49PM CET, fw@strlen.de wrote:
>Jiri Pirko <jiri@resnulli.us> wrote:
>> >> >So if someone wants to change this, simply *only* pass the reassembled
>> >> >packet through the netfilter hooks and drop the fragments, as in IPv4.
>> >> 
>> >> This is unfortunatelly not possible because in forwarding use case, the
>> >> fragments have to be send out as they come in.
>> >
>> >No, the IPv6 NAT patches fixed that, we still do proper refragmentation
>> >and we still respect the original fragment sizes, thus are not responsible
>> >for potentially exceeding the PMTU on the following path.
>> 
>> Can you please point where this is done. Where the original fragment
>> sizes are stored and in which code are they restored? Thanks.
>
>Patrick is probably talking about
>
>commit 4cdd34084d539c758d00c5dc7bf95db2e4f2bc70
>(netfilter: nf_conntrack_ipv6: improve fragmentation handling)
>which introduces 'frag_max_size' in inet6_skb_parm struct.

Thanks for the pointer. Interestingly though, according to my testing,
if reassembled packet would fit into outdev mtu, it is not fragmented
to the original frag size and it is send as single big packet. That is
I believe not correct.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [patch net-next 2/3] netfilter: ip6_tables: use reasm skb for matching
  2013-11-06 14:44               ` Jiri Pirko
@ 2013-11-06 14:51                 ` Patrick McHardy
  2013-11-06 15:29                   ` Jiri Pirko
  0 siblings, 1 reply; 25+ messages in thread
From: Patrick McHardy @ 2013-11-06 14:51 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: Florian Westphal, netdev, davem, pablo, netfilter-devel, yoshfuji,
	kadlec, mleitner, kuznet, jmorris, wensong, horms, ja, edumazet,
	pshelar, jasowang, alexander.h.duyck, coreteam

On Wed, Nov 06, 2013 at 03:44:53PM +0100, Jiri Pirko wrote:
> Wed, Nov 06, 2013 at 03:33:49PM CET, fw@strlen.de wrote:
> >Jiri Pirko <jiri@resnulli.us> wrote:
> >> >> >So if someone wants to change this, simply *only* pass the reassembled
> >> >> >packet through the netfilter hooks and drop the fragments, as in IPv4.
> >> >> 
> >> >> This is unfortunatelly not possible because in forwarding use case, the
> >> >> fragments have to be send out as they come in.
> >> >
> >> >No, the IPv6 NAT patches fixed that, we still do proper refragmentation
> >> >and we still respect the original fragment sizes, thus are not responsible
> >> >for potentially exceeding the PMTU on the following path.
> >> 
> >> Can you please point where this is done. Where the original fragment
> >> sizes are stored and in which code are they restored? Thanks.
> >
> >Patrick is probably talking about
> >
> >commit 4cdd34084d539c758d00c5dc7bf95db2e4f2bc70
> >(netfilter: nf_conntrack_ipv6: improve fragmentation handling)
> >which introduces 'frag_max_size' in inet6_skb_parm struct.

Indeed.

> Thanks for the pointer. Interestingly though, according to my testing,
> if reassembled packet would fit into outdev mtu, it is not fragmented
> to the original frag size and it is send as single big packet. That is
> I believe not correct.

Hmm right, that wasn't the intention. We currently only use frag_max_size
to send PKT_TOOBIG messages if the size exceeds the MTU, but not to trigger
fragmentation itself.

We probably need to handle this in ip6_finish_output(). Would you mind
fixing this up as well in your patches?

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [patch net-next 2/3] netfilter: ip6_tables: use reasm skb for matching
  2013-11-06 14:51                 ` Patrick McHardy
@ 2013-11-06 15:29                   ` Jiri Pirko
  2013-11-06 16:12                     ` Eric Dumazet
  0 siblings, 1 reply; 25+ messages in thread
From: Jiri Pirko @ 2013-11-06 15:29 UTC (permalink / raw)
  To: Patrick McHardy
  Cc: Florian Westphal, netdev, davem, pablo, netfilter-devel, yoshfuji,
	kadlec, mleitner, kuznet, jmorris, wensong, horms, ja, edumazet,
	pshelar, jasowang, alexander.h.duyck, coreteam

Wed, Nov 06, 2013 at 03:51:04PM CET, kaber@trash.net wrote:
>On Wed, Nov 06, 2013 at 03:44:53PM +0100, Jiri Pirko wrote:
>> Wed, Nov 06, 2013 at 03:33:49PM CET, fw@strlen.de wrote:
>> >Jiri Pirko <jiri@resnulli.us> wrote:
>> >> >> >So if someone wants to change this, simply *only* pass the reassembled
>> >> >> >packet through the netfilter hooks and drop the fragments, as in IPv4.
>> >> >> 
>> >> >> This is unfortunatelly not possible because in forwarding use case, the
>> >> >> fragments have to be send out as they come in.
>> >> >
>> >> >No, the IPv6 NAT patches fixed that, we still do proper refragmentation
>> >> >and we still respect the original fragment sizes, thus are not responsible
>> >> >for potentially exceeding the PMTU on the following path.
>> >> 
>> >> Can you please point where this is done. Where the original fragment
>> >> sizes are stored and in which code are they restored? Thanks.
>> >
>> >Patrick is probably talking about
>> >
>> >commit 4cdd34084d539c758d00c5dc7bf95db2e4f2bc70
>> >(netfilter: nf_conntrack_ipv6: improve fragmentation handling)
>> >which introduces 'frag_max_size' in inet6_skb_parm struct.
>
>Indeed.
>
>> Thanks for the pointer. Interestingly though, according to my testing,
>> if reassembled packet would fit into outdev mtu, it is not fragmented
>> to the original frag size and it is send as single big packet. That is
>> I believe not correct.
>
>Hmm right, that wasn't the intention. We currently only use frag_max_size
>to send PKT_TOOBIG messages if the size exceeds the MTU, but not to trigger
>fragmentation itself.
>
>We probably need to handle this in ip6_finish_output(). Would you mind
>fixing this up as well in your patches?

I'm working on it atm.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [patch net-next 2/3] netfilter: ip6_tables: use reasm skb for matching
  2013-11-06 15:29                   ` Jiri Pirko
@ 2013-11-06 16:12                     ` Eric Dumazet
  0 siblings, 0 replies; 25+ messages in thread
From: Eric Dumazet @ 2013-11-06 16:12 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: Patrick McHardy, Florian Westphal, netdev, davem, pablo,
	netfilter-devel, yoshfuji, kadlec, mleitner, kuznet, jmorris,
	wensong, horms, ja, edumazet, pshelar, jasowang,
	alexander.h.duyck, coreteam

On Wed, 2013-11-06 at 16:29 +0100, Jiri Pirko wrote:

> I'm working on it atm.

Note that we have a similar problem in general with GRO and forwading.

We do no check against output mtu, if the packet is GSO.

skb_is_gso() being true bypasses the safety checks.

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2013-11-06 16:12 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-11-05 11:02 [patch net-next 0/3] couple of reasm fixes Jiri Pirko
2013-11-05 11:02 ` [patch net-next 1/3] move skb_nfct_reasm into skbuff.h Jiri Pirko
2013-11-05 11:50   ` Marcelo Ricardo Leitner
2013-11-06  1:00   ` Simon Horman
2013-11-05 11:02 ` [patch net-next 2/3] netfilter: ip6_tables: use reasm skb for matching Jiri Pirko
2013-11-05 11:50   ` Marcelo Ricardo Leitner
2013-11-05 13:32   ` Florian Westphal
2013-11-05 13:41     ` Patrick McHardy
2013-11-05 15:01       ` Jiri Pirko
2013-11-05 15:39         ` Florian Westphal
2013-11-05 18:19           ` Patrick McHardy
2013-11-05 18:21             ` Jiri Pirko
2013-11-05 18:16         ` Patrick McHardy
2013-11-05 20:55           ` Jiri Pirko
2013-11-05 22:02             ` Patrick McHardy
2013-11-06 14:18           ` Jiri Pirko
2013-11-06 14:33             ` Florian Westphal
2013-11-06 14:44               ` Jiri Pirko
2013-11-06 14:51                 ` Patrick McHardy
2013-11-06 15:29                   ` Jiri Pirko
2013-11-06 16:12                     ` Eric Dumazet
2013-11-05 11:02 ` [patch net-next 3/3] fix skb_morph to preserve skb->sk and skb->destructor pointers Jiri Pirko
2013-11-05 11:50   ` Marcelo Ricardo Leitner
2013-11-05 14:06   ` Eric Dumazet
2013-11-05 14:47     ` Jiri Pirko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).