netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] net: ip_finish_output_gso: If skb_gso_network_seglen exceeds MTU, do segmentation even for non IPSKB_FORWARDED skbs
@ 2016-07-05 12:35 Shmulik Ladkani
  2016-07-05 13:03 ` Florian Westphal
  0 siblings, 1 reply; 19+ messages in thread
From: Shmulik Ladkani @ 2016-07-05 12:35 UTC (permalink / raw)
  To: David S. Miller
  Cc: Florian Westphal, Eric Dumazet, Hannes Frederic Sowa,
	shmulik.ladkani, netdev, Shmulik Ladkani

Given:
 - tap0, vxlan0 enslaved under a bridge
 - eth0 is the tunnel underlay having small mtu (e.g. 1400)

Assume GSO skbs arriving from tap0 having a gso_size as determined by
user-provided virtio_net_hdr (e.g. 1460 corresponding to VM mtu of 1500).

After encapsulation these skbs have skb_gso_network_seglen that exceed
underlay ip_skb_dst_mtu.

These skbs are accidentally passed to ip_finish_output2 AS IS; however
each final segment (either segmented by validate_xmit_skb of eth0, or
by eth0 hardware UFO) would be larger than eth0 mtu.
As a result, those above-mtu segments get dropped on certain underlay
networks.

The expected behavior in such a setup would be segmenting the skb first,
and then fragmenting each segment according to dst mtu, and finally
passing the resulting fragments to ip_finish_output2.

'ip_finish_output_gso' already supports this "Slowpath" behavior,
but it is only considered if IPSKB_FORWARDED is set.

However in the bridged case, IPSKB_FORWARDED is off, and the "Slowpath"
behavior is not considered.

Fix, by performing ip_finish_output_gso "Slowpath" even for non
IPSKB_FORWARDED skbs.

This is also OK for locally created skbs, as they likely to have
skb_gso_network_seglen that equals dst mtu, and thus will go directly to
'ip_finish_output2' as done prior this fix.

Signed-off-by: Shmulik Ladkani <shmulik.ladkani@ravellosystems.com>
---
 net/ipv4/ip_output.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index cbac493..8ae65b3 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -223,9 +223,8 @@ static int ip_finish_output_gso(struct net *net, struct sock *sk,
 	struct sk_buff *segs;
 	int ret = 0;
 
-	/* common case: locally created skb or seglen is <= mtu */
-	if (((IPCB(skb)->flags & IPSKB_FORWARDED) == 0) ||
-	      skb_gso_validate_mtu(skb, mtu))
+	/* common case: seglen is <= mtu */
+	if (skb_gso_validate_mtu(skb, mtu))
 		return ip_finish_output2(net, sk, skb);
 
 	/* Slowpath -  GSO segment length is exceeding the dst MTU.
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2016-07-14 23:32 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-07-05 12:35 [PATCH] net: ip_finish_output_gso: If skb_gso_network_seglen exceeds MTU, do segmentation even for non IPSKB_FORWARDED skbs Shmulik Ladkani
2016-07-05 13:03 ` Florian Westphal
2016-07-05 14:05   ` Shmulik Ladkani
2016-07-09  3:12     ` David Miller
2016-07-09  9:06       ` Florian Westphal
2016-07-09  9:00     ` Florian Westphal
2016-07-09 12:30       ` Shmulik Ladkani
2016-07-09 13:22         ` Florian Westphal
2016-07-10  7:51           ` Shmulik Ladkani
2016-07-11  8:15             ` Florian Westphal
2016-07-11 13:32               ` Hannes Frederic Sowa
2016-07-12  5:56           ` Shmulik Ladkani
2016-07-13 14:00             ` Shmulik Ladkani
2016-07-14 13:12               ` Hannes Frederic Sowa
2016-07-14 14:13                 ` Shmulik Ladkani
2016-07-14 23:32                   ` Hannes Frederic Sowa
2016-07-10 20:14         ` Shmulik Ladkani
2016-07-11  8:13           ` Florian Westphal
2016-07-09 15:10       ` Hannes Frederic Sowa

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).