netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH] net: ip_finish_output_gso: Attempt gso_size clamping if segments exceed mtu
@ 2016-08-22 12:06 Shmulik Ladkani
  2016-08-22 12:58 ` Florian Westphal
  0 siblings, 1 reply; 8+ messages in thread
From: Shmulik Ladkani @ 2016-08-22 12:06 UTC (permalink / raw)
  To: David S. Miller
  Cc: netdev, Shmulik Ladkani, Hannes Frederic Sowa, Eric Dumazet,
	Florian Westphal

There are cases where gso skbs (which originate from an ingress
interface) have a gso_size value that exceeds the output dst mtu:

 - ipv4 forwarding middlebox having in/out interfaces with different mtus
   addressed by fe6cc55f3a 'net: ip, ipv6: handle gso skbs in forwarding path'
 - bridge having a tunnel member interface stacked over a device with small mtu
   addressed by b8247f095e 'net: ip_finish_output_gso: If skb_gso_network_seglen exceeds MTU, allow segmentation for local udp tunneled skbs'

In both cases, such skbs are identified, then go through early software
segmentation+fragmentation as part of ip_finish_output_gso.

Another approach is to shrink the gso_size to a value suitable so
resulting segments are smaller than dst mtu, as suggeted by Eric
Dumazet (as part of [1]) and Florian Westphal (as part of [2]).

This will void the need for software segmentation/fragmentation at
ip_finish_output_gso, thus significantly improve throughput and lower
cpu load.

This RFC patch attempts to implement this gso_size clamping.

[1] https://patchwork.ozlabs.org/patch/314327/
[2] https://patchwork.ozlabs.org/patch/644724/

Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Florian Westphal <fw@strlen.de>

Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
---

 Comments welcome.

 Few questions embedded in the patch.

 Florian, in fe6cc55f you described a BUG due to gso_size decrease.
 I've tested both bridged and routed cases, but in my setups failed to
 hit the issue; Appreciate if you can provide some hints.

 net/ipv4/ip_output.c | 44 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 44 insertions(+)

diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index dde37fb..b911b43 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -216,6 +216,40 @@ static int ip_finish_output2(struct net *net, struct sock *sk, struct sk_buff *s
 	return -EINVAL;
 }
 
+static inline int skb_gso_clamp(struct sk_buff *skb, unsigned int network_len)
+{
+	struct skb_shared_info *shinfo = skb_shinfo(skb);
+	unsigned short gso_size;
+	unsigned int seglen;
+
+	if (shinfo->gso_size == GSO_BY_FRAGS)
+		return -EINVAL;
+
+	seglen = skb_gso_network_seglen(skb);
+
+	/* Decrease gso_size to fit specified network length */
+	gso_size = shinfo->gso_size - (seglen - network_len);
+	if (shinfo->gso_type & SKB_GSO_UDP)
+		gso_size &= ~7;
+
+	if (!(shinfo->gso_type & SKB_GSO_DODGY)) {
+		/* Recalculate gso_segs for skbs of trusted sources.
+		 * Untrusted skbs will have their gso_segs calculated by
+		 * skb_gso_segment.
+		 */
+		unsigned int hdr_len, payload_len;
+
+		hdr_len = seglen - shinfo->gso_size;
+		payload_len = skb->len - hdr_len;
+		shinfo->gso_segs = DIV_ROUND_UP(payload_len, gso_size);
+
+		// Q: need to verify gso_segs <= dev->gso_max_segs?
+		//    seems to be protected by netif_skb_features
+	}
+	shinfo->gso_size = gso_size;
+	return 0;
+}
+
 static int ip_finish_output_gso(struct net *net, struct sock *sk,
 				struct sk_buff *skb, unsigned int mtu)
 {
@@ -237,6 +271,16 @@ static int ip_finish_output_gso(struct net *net, struct sock *sk,
 	 * 2) skb arrived via virtio-net, we thus get TSO/GSO skbs directly
 	 * from host network stack.
 	 */
+
+	/* Attempt to clamp gso_size to avoid segmenting and fragmenting.
+	 *
+	 * Q: policy needed? per device?
+	 */
+	if (sysctl_ip_output_gso_clamp) {
+		if (!skb_gso_clamp(skb, mtu))
+			return ip_finish_output2(net, sk, skb);
+	}
+
 	features = netif_skb_features(skb);
 	BUILD_BUG_ON(sizeof(*IPCB(skb)) > SKB_SGO_CB_OFFSET);
 	segs = skb_gso_segment(skb, features & ~NETIF_F_GSO_MASK);
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2016-09-09  5:48 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-08-22 12:06 [RFC PATCH] net: ip_finish_output_gso: Attempt gso_size clamping if segments exceed mtu Shmulik Ladkani
2016-08-22 12:58 ` Florian Westphal
2016-08-22 13:05   ` Shmulik Ladkani
2016-08-24 14:53   ` Shmulik Ladkani
2016-08-24 14:59     ` Florian Westphal
2016-08-25  9:05   ` Shmulik Ladkani
2016-08-26 11:19     ` Herbert Xu
2016-09-09  5:48     ` Shmulik Ladkani

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).