[PATCH -next, V3 0/2] net: force refragmentation for DF reassembed skbs

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH -next, V3 0/2] net: force refragmentation for DF reassembed skbs
@ 2015-05-22 14:32 Florian Westphal
  2015-05-22 14:32 ` [PATCH -next 1/2] net: ipv4: avoid repeated calls to ip_skb_dst_mtu helper Florian Westphal
                   ` (3 more replies)
  0 siblings, 4 replies; 11+ messages in thread
From: Florian Westphal @ 2015-05-22 14:32 UTC (permalink / raw)
  To: netdev; +Cc: hannes

output path tests:

    if (skb->len > mtu) ip_fragment()

This breaks connectivity in one corner case:
 If the skb was reassembled, but has the DF bit set and ..
 .. its reassembled size is <= outdev mtu ..
 .. we will forward a DF packet larger than what the sender
    transmitted on wire.

If a router later in the path can't forward this packet, it will send an
icmp error in response to an mtu that the original sender never exceeded.

This changes ipv4 defrag/output path to

a) force refragmentation for DF reassembled skbs and
b) set DF bit on all fragments when refragmenting if it was set on original
frags.

tested via:
#!/usr/bin/python
from scapy.all import *
dip="10.23.42.2"
payload="A"*1400
packet=IP(dst=dip,id=12345,flags='DF')/UDP(sport=42,dport=42)/payload
frags=fragment(packet,fragsize=1200)
for fragment in frags:
    send(fragment)

Without this patch, we generate fragments without df bit set based
on the outgoing device mtu when fragmenting after forwarding, ie.

IP (ttl 64, id 12345, offset 0, flags [+, DF], proto UDP (17), length 1204)
    192.168.7.1.42 > 10.23.42.2.42: UDP, length 1400
IP (ttl 64, id 12345, offset 1184, flags [DF], proto UDP (17), length 244)
    192.168.7.1 > 10.23.42.2: ip-proto-17

on ingress will either turn into

IP (ttl 63, id 12345, offset 0, flags [+], proto UDP (17), length 1396)
    192.168.7.1.42 > 10.23.42.2.42: UDP, length 1400
IP (ttl 63, id 12345, offset 1376, flags [none], proto UDP (17), length 52)

(mtu 1400: We strip df and send larger fragment), or

IP (ttl 63, id 12345, offset 0, flags [DF], proto UDP (17), length 1428)
    192.168.7.1.42 > 10.23.42.2.42: [udp sum ok] UDP, length 1400

if mtu is 1500.  And in this case things break; router with a smaller mtu
will send icmp error, but original sender only sent packets <= 1204 byte.

With patch, we keep intent of such fragments and will emit DF-fragments
that won't exceed 1204 byte in size.

Joint work with Hannes Frederic Sowa.

Changes since v2:
 - split unrelated patches from series
 - rework changelog of patch #2 to better illustrate breakage

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH -next 1/2] net: ipv4: avoid repeated calls to ip_skb_dst_mtu helper
  2015-05-22 14:32 [PATCH -next, V3 0/2] net: force refragmentation for DF reassembed skbs Florian Westphal
@ 2015-05-22 14:32 ` Florian Westphal
  2015-05-22 14:43   ` Hannes Frederic Sowa
  2015-05-22 14:32 ` [PATCH -next 2/2] ip_fragment: don't forward defragmented DF packet Florian Westphal
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 11+ messages in thread
From: Florian Westphal @ 2015-05-22 14:32 UTC (permalink / raw)
  To: netdev; +Cc: hannes, Florian Westphal

ip_skb_dst_mtu is small inline helper, but its called in several places.

before: 17061      44       0   17105    42d1 net/ipv4/ip_output.o
after:  16805      44       0   16849    41d1 net/ipv4/ip_output.o

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 net/ipv4/ip_output.c | 19 ++++++++++++-------
 1 file changed, 12 insertions(+), 7 deletions(-)

diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index 8d91b92..e84a389 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -84,6 +84,7 @@ int sysctl_ip_default_ttl __read_mostly = IPDEFTTL;
 EXPORT_SYMBOL(sysctl_ip_default_ttl);
 
 static int ip_fragment(struct sock *sk, struct sk_buff *skb,
+		       unsigned int mtu,
 		       int (*output)(struct sock *, struct sk_buff *));
 
 /* Generate a checksum for an outgoing IP datagram. */
@@ -219,7 +220,8 @@ static inline int ip_finish_output2(struct sock *sk, struct sk_buff *skb)
 	return -EINVAL;
 }
 
-static int ip_finish_output_gso(struct sock *sk, struct sk_buff *skb)
+static int ip_finish_output_gso(struct sock *sk, struct sk_buff *skb,
+				unsigned int mtu)
 {
 	netdev_features_t features;
 	struct sk_buff *segs;
@@ -227,7 +229,7 @@ static int ip_finish_output_gso(struct sock *sk, struct sk_buff *skb)
 
 	/* common case: locally created skb or seglen is <= mtu */
 	if (((IPCB(skb)->flags & IPSKB_FORWARDED) == 0) ||
-	      skb_gso_network_seglen(skb) <= ip_skb_dst_mtu(skb))
+	      skb_gso_network_seglen(skb) <= mtu)
 		return ip_finish_output2(sk, skb);
 
 	/* Slowpath -  GSO segment length is exceeding the dst MTU.
@@ -251,7 +253,7 @@ static int ip_finish_output_gso(struct sock *sk, struct sk_buff *skb)
 		int err;
 
 		segs->next = NULL;
-		err = ip_fragment(sk, segs, ip_finish_output2);
+		err = ip_fragment(sk, segs, mtu, ip_finish_output2);
 
 		if (err && ret == 0)
 			ret = err;
@@ -263,6 +265,8 @@ static int ip_finish_output_gso(struct sock *sk, struct sk_buff *skb)
 
 static int ip_finish_output(struct sock *sk, struct sk_buff *skb)
 {
+	unsigned int mtu;
+
 #if defined(CONFIG_NETFILTER) && defined(CONFIG_XFRM)
 	/* Policy lookup after SNAT yielded a new policy */
 	if (skb_dst(skb)->xfrm) {
@@ -270,11 +274,12 @@ static int ip_finish_output(struct sock *sk, struct sk_buff *skb)
 		return dst_output_sk(sk, skb);
 	}
 #endif
+	mtu = ip_skb_dst_mtu(skb);
 	if (skb_is_gso(skb))
-		return ip_finish_output_gso(sk, skb);
+		return ip_finish_output_gso(sk, skb, mtu);
 
-	if (skb->len > ip_skb_dst_mtu(skb))
-		return ip_fragment(sk, skb, ip_finish_output2);
+	if (skb->len > mtu)
+		return ip_fragment(sk, skb, mtu, ip_finish_output2);
 
 	return ip_finish_output2(sk, skb);
 }
@@ -482,10 +487,10 @@ static void ip_copy_metadata(struct sk_buff *to, struct sk_buff *from)
 }
 
 static int ip_fragment(struct sock *sk, struct sk_buff *skb,
+		       unsigned int mtu,
 		       int (*output)(struct sock *, struct sk_buff *))
 {
 	struct iphdr *iph = ip_hdr(skb);
-	unsigned int mtu = ip_skb_dst_mtu(skb);
 
 	if (unlikely(((iph->frag_off & htons(IP_DF)) && !skb->ignore_df) ||
 		     (IPCB(skb)->frag_max_size &&
-- 
2.0.5

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH -next 2/2] ip_fragment: don't forward defragmented DF packet
  2015-05-22 14:32 [PATCH -next, V3 0/2] net: force refragmentation for DF reassembed skbs Florian Westphal
  2015-05-22 14:32 ` [PATCH -next 1/2] net: ipv4: avoid repeated calls to ip_skb_dst_mtu helper Florian Westphal
@ 2015-05-22 14:32 ` Florian Westphal
  2015-05-22 14:45   ` Hannes Frederic Sowa
  2015-05-22 19:03 ` [PATCH -next, V3 0/2] net: force refragmentation for DF reassembed skbs David Miller
  2015-05-27 17:04 ` David Miller
  3 siblings, 1 reply; 11+ messages in thread
From: Florian Westphal @ 2015-05-22 14:32 UTC (permalink / raw)
  To: netdev; +Cc: hannes, Florian Westphal

We currently always send fragments without DF bit set.

Thus, given following setup:

mtu1500 - mtu1500:1400 - mtu1400:1280 - mtu1280
   A           R1              R2         B

Where R1 and R2 run linux with netfilter defragmentation/conntrack
enabled, then if Host A sent a fragmented packet _with_ DF set to B, R1
will respond with icmp too big error if one of these fragments exceeded
1400 bytes.

However, if R1 receives fragment sizes 1200 and 100, it would
forward the reassembled packet without refragmenting, i.e.
R2 will send an icmp error in response to a packet that was never sent,
citing mtu that the original sender never exceeded.

The other minor issue is that a refragmentation on R1 will conceal the
MTU of R2-B since refragmentation does not set DF bit on the fragments.

This modifies ip_fragment so that we track largest fragment size seen
both for DF and non-DF packets, and set frag_max_size to the largest
value.

If the DF fragment size is larger or equal to the non-df one, we will
consider the packet a path mtu probe:
We set DF bit on the reassembled skb and also tag it with a new IPCB flag
to force refragmentation even if skb fits outdev mtu.

We will also set DF bit on each fragment in this case.

Joint work with Hannes Frederic Sowa.

Reported-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
---
 include/net/inet_frag.h |  2 +-
 include/net/ip.h        |  1 +
 net/ipv4/ip_fragment.c  | 31 ++++++++++++++++++++++++++-----
 net/ipv4/ip_output.c    | 12 ++++++++++--
 4 files changed, 38 insertions(+), 8 deletions(-)

diff --git a/include/net/inet_frag.h b/include/net/inet_frag.h
index 8d17655..e1300b3 100644
--- a/include/net/inet_frag.h
+++ b/include/net/inet_frag.h
@@ -43,7 +43,7 @@ enum {
  * @len: total length of the original datagram
  * @meat: length of received fragments so far
  * @flags: fragment queue flags
- * @max_size: (ipv4 only) maximum received fragment size with IP_DF set
+ * @max_size: maximum received fragment size
  * @net: namespace that this frag belongs to
  */
 struct inet_frag_queue {
diff --git a/include/net/ip.h b/include/net/ip.h
index 7921a36..9b976cf 100644
--- a/include/net/ip.h
+++ b/include/net/ip.h
@@ -45,6 +45,7 @@ struct inet_skb_parm {
 #define IPSKB_FRAG_COMPLETE	BIT(3)
 #define IPSKB_REROUTED		BIT(4)
 #define IPSKB_DOREDIRECT	BIT(5)
+#define IPSKB_FRAG_PMTU		BIT(6)
 
 	u16			frag_max_size;
 };
diff --git a/net/ipv4/ip_fragment.c b/net/ipv4/ip_fragment.c
index 47fa64e..a50dc6d 100644
--- a/net/ipv4/ip_fragment.c
+++ b/net/ipv4/ip_fragment.c
@@ -75,6 +75,7 @@ struct ipq {
 	__be16		id;
 	u8		protocol;
 	u8		ecn; /* RFC3168 support */
+	u16		max_df_size; /* largest frag with DF set seen */
 	int             iif;
 	unsigned int    rid;
 	struct inet_peer *peer;
@@ -326,6 +327,7 @@ static int ip_frag_queue(struct ipq *qp, struct sk_buff *skb)
 {
 	struct sk_buff *prev, *next;
 	struct net_device *dev;
+	unsigned int fragsize;
 	int flags, offset;
 	int ihl, end;
 	int err = -ENOENT;
@@ -481,9 +483,14 @@ found:
 	if (offset == 0)
 		qp->q.flags |= INET_FRAG_FIRST_IN;
 
+	fragsize = skb->len + ihl;
+
+	if (fragsize > qp->q.max_size)
+		qp->q.max_size = fragsize;
+
 	if (ip_hdr(skb)->frag_off & htons(IP_DF) &&
-	    skb->len + ihl > qp->q.max_size)
-		qp->q.max_size = skb->len + ihl;
+	    fragsize > qp->max_df_size)
+		qp->max_df_size = fragsize;
 
 	if (qp->q.flags == (INET_FRAG_FIRST_IN | INET_FRAG_LAST_IN) &&
 	    qp->q.meat == qp->q.len) {
@@ -613,13 +620,27 @@ static int ip_frag_reasm(struct ipq *qp, struct sk_buff *prev,
 	head->next = NULL;
 	head->dev = dev;
 	head->tstamp = qp->q.stamp;
-	IPCB(head)->frag_max_size = qp->q.max_size;
+	IPCB(head)->frag_max_size = max(qp->max_df_size, qp->q.max_size);
 
 	iph = ip_hdr(head);
-	/* max_size != 0 implies at least one fragment had IP_DF set */
-	iph->frag_off = qp->q.max_size ? htons(IP_DF) : 0;
 	iph->tot_len = htons(len);
 	iph->tos |= ecn;
+
+	/* When we set IP_DF on a refragmented skb we must also force a
+	 * call to ip_fragment to avoid forwarding a DF-skb of size s while
+	 * original sender only sent fragments of size f (where f < s).
+	 *
+	 * We only set DF/IPSKB_FRAG_PMTU if such DF fragment was the largest
+	 * frag seen to avoid sending tiny DF-fragments in case skb was built
+	 * from one very small df-fragment and one large non-df frag.
+	 */
+	if (qp->max_df_size == qp->q.max_size) {
+		IPCB(head)->flags |= IPSKB_FRAG_PMTU;
+		iph->frag_off = htons(IP_DF);
+	} else {
+		iph->frag_off = 0;
+	}
+
 	IP_INC_STATS_BH(net, IPSTATS_MIB_REASMOKS);
 	qp->q.fragments = NULL;
 	qp->q.fragments_tail = NULL;
diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index e84a389..9b55ef6 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -278,7 +278,7 @@ static int ip_finish_output(struct sock *sk, struct sk_buff *skb)
 	if (skb_is_gso(skb))
 		return ip_finish_output_gso(sk, skb, mtu);
 
-	if (skb->len > mtu)
+	if (skb->len > mtu || (IPCB(skb)->flags & IPSKB_FRAG_PMTU))
 		return ip_fragment(sk, skb, mtu, ip_finish_output2);
 
 	return ip_finish_output2(sk, skb);
@@ -492,7 +492,10 @@ static int ip_fragment(struct sock *sk, struct sk_buff *skb,
 {
 	struct iphdr *iph = ip_hdr(skb);
 
-	if (unlikely(((iph->frag_off & htons(IP_DF)) && !skb->ignore_df) ||
+	if ((iph->frag_off & htons(IP_DF)) == 0)
+		return ip_do_fragment(sk, skb, output);
+
+	if (unlikely(!skb->ignore_df ||
 		     (IPCB(skb)->frag_max_size &&
 		      IPCB(skb)->frag_max_size > mtu))) {
 		struct rtable *rt = skb_rtable(skb);
@@ -537,6 +540,8 @@ int ip_do_fragment(struct sock *sk, struct sk_buff *skb,
 	iph = ip_hdr(skb);
 
 	mtu = ip_skb_dst_mtu(skb);
+	if (IPCB(skb)->frag_max_size && IPCB(skb)->frag_max_size < mtu)
+		mtu = IPCB(skb)->frag_max_size;
 
 	/*
 	 *	Setup starting values.
@@ -732,6 +737,9 @@ slow_path:
 		iph = ip_hdr(skb2);
 		iph->frag_off = htons((offset >> 3));
 
+		if (IPCB(skb)->flags & IPSKB_FRAG_PMTU)
+			iph->frag_off |= htons(IP_DF);
+
 		/* ANK: dirty, but effective trick. Upgrade options only if
 		 * the segment to be fragmented was THE FIRST (otherwise,
 		 * options are already fixed) and make it ONCE
-- 
2.0.5

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH -next 1/2] net: ipv4: avoid repeated calls to ip_skb_dst_mtu helper
  2015-05-22 14:32 ` [PATCH -next 1/2] net: ipv4: avoid repeated calls to ip_skb_dst_mtu helper Florian Westphal
@ 2015-05-22 14:43   ` Hannes Frederic Sowa
  0 siblings, 0 replies; 11+ messages in thread
From: Hannes Frederic Sowa @ 2015-05-22 14:43 UTC (permalink / raw)
  To: Florian Westphal; +Cc: netdev

On Fr, 2015-05-22 at 16:32 +0200, Florian Westphal wrote:
> ip_skb_dst_mtu is small inline helper, but its called in several places.
> 
> before: 17061      44       0   17105    42d1 net/ipv4/ip_output.o
> after:  16805      44       0   16849    41d1 net/ipv4/ip_output.o
> 
> Signed-off-by: Florian Westphal <fw@strlen.de>

Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>

This also does reduce some dst_mtu === dst->ops->mtu() indirect calls,
maybe also giving a tiny speed-up. :)

Bye,
Hannes

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH -next 2/2] ip_fragment: don't forward defragmented DF packet
  2015-05-22 14:32 ` [PATCH -next 2/2] ip_fragment: don't forward defragmented DF packet Florian Westphal
@ 2015-05-22 14:45   ` Hannes Frederic Sowa
  0 siblings, 0 replies; 11+ messages in thread
From: Hannes Frederic Sowa @ 2015-05-22 14:45 UTC (permalink / raw)
  To: Florian Westphal; +Cc: netdev

On Fr, 2015-05-22 at 16:32 +0200, Florian Westphal wrote:
> We currently always send fragments without DF bit set.
> 
> Thus, given following setup:
> 
> mtu1500 - mtu1500:1400 - mtu1400:1280 - mtu1280
>    A           R1              R2         B
> 
> Where R1 and R2 run linux with netfilter defragmentation/conntrack
> enabled, then if Host A sent a fragmented packet _with_ DF set to B, R1
> will respond with icmp too big error if one of these fragments exceeded
> 1400 bytes.
> 
> However, if R1 receives fragment sizes 1200 and 100, it would
> forward the reassembled packet without refragmenting, i.e.
> R2 will send an icmp error in response to a packet that was never sent,
> citing mtu that the original sender never exceeded.
> 
> The other minor issue is that a refragmentation on R1 will conceal the
> MTU of R2-B since refragmentation does not set DF bit on the fragments.
> 
> This modifies ip_fragment so that we track largest fragment size seen
> both for DF and non-DF packets, and set frag_max_size to the largest
> value.
> 
> If the DF fragment size is larger or equal to the non-df one, we will
> consider the packet a path mtu probe:
> We set DF bit on the reassembled skb and also tag it with a new IPCB flag
> to force refragmentation even if skb fits outdev mtu.
> 
> We will also set DF bit on each fragment in this case.
> 
> Joint work with Hannes Frederic Sowa.
> 
> Reported-by: Jesse Gross <jesse@nicira.com>
> Signed-off-by: Florian Westphal <fw@strlen.de>

And also:
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>

Thanks,
Hannes

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH -next, V3 0/2] net: force refragmentation for DF reassembed skbs
  2015-05-22 14:32 [PATCH -next, V3 0/2] net: force refragmentation for DF reassembed skbs Florian Westphal
  2015-05-22 14:32 ` [PATCH -next 1/2] net: ipv4: avoid repeated calls to ip_skb_dst_mtu helper Florian Westphal
  2015-05-22 14:32 ` [PATCH -next 2/2] ip_fragment: don't forward defragmented DF packet Florian Westphal
@ 2015-05-22 19:03 ` David Miller
  2015-05-22 19:26   ` Florian Westphal
  2015-05-22 22:52   ` Hannes Frederic Sowa
  2015-05-27 17:04 ` David Miller
  3 siblings, 2 replies; 11+ messages in thread
From: David Miller @ 2015-05-22 19:03 UTC (permalink / raw)
  To: fw; +Cc: netdev, hannes

From: Florian Westphal <fw@strlen.de>
Date: Fri, 22 May 2015 16:32:49 +0200

> IP (ttl 64, id 12345, offset 0, flags [+, DF], proto UDP (17), length 1204)
>     192.168.7.1.42 > 10.23.42.2.42: UDP, length 1400
> IP (ttl 64, id 12345, offset 1184, flags [DF], proto UDP (17), length 244)
>     192.168.7.1 > 10.23.42.2: ip-proto-17

I almost consider a fragment with DF set an oxymoron.

How does this happen?

DF is a mechanism for PMTU discovery.  Therefore it is the end nodes
which set DF on packets before fragmentation can even take place.
Since DF is set initially, the logical consequence is that the packet
cannot be legally fragmented along the way.

Are tunnels doing something to create this situation?

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH -next, V3 0/2] net: force refragmentation for DF reassembed skbs
  2015-05-22 19:03 ` [PATCH -next, V3 0/2] net: force refragmentation for DF reassembed skbs David Miller
@ 2015-05-22 19:26   ` Florian Westphal
  2015-05-26  9:57     ` Maxime Bizon
  2015-05-22 22:52   ` Hannes Frederic Sowa
  1 sibling, 1 reply; 11+ messages in thread
From: Florian Westphal @ 2015-05-22 19:26 UTC (permalink / raw)
  To: David Miller; +Cc: fw, netdev, hannes, Maxime Bizon

David Miller <davem@davemloft.net> wrote:

[ cc'd Maxime Bizon ]

> From: Florian Westphal <fw@strlen.de>
> Date: Fri, 22 May 2015 16:32:49 +0200
> 
> > IP (ttl 64, id 12345, offset 0, flags [+, DF], proto UDP (17), length 1204)
> >     192.168.7.1.42 > 10.23.42.2.42: UDP, length 1400
> > IP (ttl 64, id 12345, offset 1184, flags [DF], proto UDP (17), length 244)
> >     192.168.7.1 > 10.23.42.2: ip-proto-17
> 
> I almost consider a fragment with DF set an oxymoron.
> 
> How does this happen?

Good question.  I have no idea why or how this happens.

But it does happen, see e.g. following bug report:
http://marc.info/?l=linux-netdev&m=139870308431986&w=2

Maxime, do you recall what type of traffic generates
the DF-fragments you reported?

Thanks.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH -next, V3 0/2] net: force refragmentation for DF reassembed skbs
  2015-05-22 19:03 ` [PATCH -next, V3 0/2] net: force refragmentation for DF reassembed skbs David Miller
  2015-05-22 19:26   ` Florian Westphal
@ 2015-05-22 22:52   ` Hannes Frederic Sowa
  1 sibling, 0 replies; 11+ messages in thread
From: Hannes Frederic Sowa @ 2015-05-22 22:52 UTC (permalink / raw)
  To: David Miller, Florian Westphal; +Cc: netdev

On Fri, May 22, 2015, at 21:03, David Miller wrote:
> From: Florian Westphal <fw@strlen.de>
> Date: Fri, 22 May 2015 16:32:49 +0200
> 
> > IP (ttl 64, id 12345, offset 0, flags [+, DF], proto UDP (17), length 1204)
> >     192.168.7.1.42 > 10.23.42.2.42: UDP, length 1400
> > IP (ttl 64, id 12345, offset 1184, flags [DF], proto UDP (17), length 244)
> >     192.168.7.1 > 10.23.42.2: ip-proto-17
> 
> I almost consider a fragment with DF set an oxymoron.
> 
> How does this happen?
> 
> DF is a mechanism for PMTU discovery.  Therefore it is the end nodes
> which set DF on packets before fragmentation can even take place.
> Since DF is set initially, the logical consequence is that the packet
> cannot be legally fragmented along the way.
> 
> Are tunnels doing something to create this situation?

Hmm, interesting...

I think openvswitch still has this behavior that it always sets
ignore_df but DF bit control is up to user space.
Another situation where this can arise if a local socket with
ip_sk_ignore_df(sk) = 1 (we keep socket reference as long as possible ;)
) sends packets into a tunnel, like gre, where pmtudisc is enabled. It
should also look like those packets. This is because we don't clear
ignore_df during passing down the skb.

Bye,
Hannes

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH -next, V3 0/2] net: force refragmentation for DF reassembed skbs
  2015-05-22 19:26   ` Florian Westphal
@ 2015-05-26  9:57     ` Maxime Bizon
  2015-05-26 14:50       ` Florian Westphal
  0 siblings, 1 reply; 11+ messages in thread
From: Maxime Bizon @ 2015-05-26  9:57 UTC (permalink / raw)
  To: Florian Westphal; +Cc: David Miller, netdev, hannes

On Fri, 2015-05-22 at 21:26 +0200, Florian Westphal wrote:

Hello,

> But it does happen, see e.g. following bug report:
> http://marc.info/?l=linux-netdev&m=139870308431986&w=2
> 
> Maxime, do you recall what type of traffic generates
> the DF-fragments you reported?

Yep

We are an ISP and provide our own home gateway to the subscribers, which
ends up routing traffic of a large range of end user devices.

In that case, the frag+DF traffic was seen in an exchange between a
femtocell and a femto GW during the IPsec IKE exchange, more precisely
on the IKE_AUTH message sent from the femto GW.

You can contact me privately if you need more details.

-- 
Maxime

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH -next, V3 0/2] net: force refragmentation for DF reassembed skbs
  2015-05-26  9:57     ` Maxime Bizon
@ 2015-05-26 14:50       ` Florian Westphal
  0 siblings, 0 replies; 11+ messages in thread
From: Florian Westphal @ 2015-05-26 14:50 UTC (permalink / raw)
  To: Maxime Bizon; +Cc: Florian Westphal, David Miller, netdev, hannes

Maxime Bizon <mbizon@freebox.fr> wrote:
> On Fri, 2015-05-22 at 21:26 +0200, Florian Westphal wrote:
> > But it does happen, see e.g. following bug report:
> > http://marc.info/?l=linux-netdev&m=139870308431986&w=2
> > 
> > Maxime, do you recall what type of traffic generates
> > the DF-fragments you reported?
> 
> Yep
> 
> We are an ISP and provide our own home gateway to the subscribers, which
> ends up routing traffic of a large range of end user devices.
> 
> In that case, the frag+DF traffic was seen in an exchange between a
> femtocell and a femto GW during the IPsec IKE exchange, more precisely
> on the IKE_AUTH message sent from the femto GW.

Thanks, so it seems its used to push udp frag/defrag operation to end
hosts.

> You can contact me privately if you need more details.

Its enough for me to know that this isn't random fluke, thanks.

Dave, if you disagree, one possibility would be to strip DF bit on
defrag/refrag when forwarding.

However, I think that we should respect end host "wish", i.e. reject too
big df fragment and also re-set DF on refrag so we don't conceal lower
mtu in the network.

Thanks,
Florian

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH -next, V3 0/2] net: force refragmentation for DF reassembed skbs
  2015-05-22 14:32 [PATCH -next, V3 0/2] net: force refragmentation for DF reassembed skbs Florian Westphal
                   ` (2 preceding siblings ...)
  2015-05-22 19:03 ` [PATCH -next, V3 0/2] net: force refragmentation for DF reassembed skbs David Miller
@ 2015-05-27 17:04 ` David Miller
  3 siblings, 0 replies; 11+ messages in thread
From: David Miller @ 2015-05-27 17:04 UTC (permalink / raw)
  To: fw; +Cc: netdev, hannes

From: Florian Westphal <fw@strlen.de>
Date: Fri, 22 May 2015 16:32:49 +0200

> output path tests:
> 
>     if (skb->len > mtu) ip_fragment()
> 
> This breaks connectivity in one corner case:
>  If the skb was reassembled, but has the DF bit set and ..
>  .. its reassembled size is <= outdev mtu ..
>  .. we will forward a DF packet larger than what the sender
>     transmitted on wire.
> 
> If a router later in the path can't forward this packet, it will send an
> icmp error in response to an mtu that the original sender never exceeded.
> 
> This changes ipv4 defrag/output path to
> 
> a) force refragmentation for DF reassembled skbs and
> b) set DF bit on all fragments when refragmenting if it was set on original
> frags.
 ...

Series applied, thanks Florian.

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2015-05-27 17:04 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-05-22 14:32 [PATCH -next, V3 0/2] net: force refragmentation for DF reassembed skbs Florian Westphal
2015-05-22 14:32 ` [PATCH -next 1/2] net: ipv4: avoid repeated calls to ip_skb_dst_mtu helper Florian Westphal
2015-05-22 14:43   ` Hannes Frederic Sowa
2015-05-22 14:32 ` [PATCH -next 2/2] ip_fragment: don't forward defragmented DF packet Florian Westphal
2015-05-22 14:45   ` Hannes Frederic Sowa
2015-05-22 19:03 ` [PATCH -next, V3 0/2] net: force refragmentation for DF reassembed skbs David Miller
2015-05-22 19:26   ` Florian Westphal
2015-05-26  9:57     ` Maxime Bizon
2015-05-26 14:50       ` Florian Westphal
2015-05-22 22:52   ` Hannes Frederic Sowa
2015-05-27 17:04 ` David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).