From mboxrd@z Thu Jan 1 00:00:00 1970 From: Shmulik Ladkani Subject: Re: [RFC PATCH] net: ip_finish_output_gso: Attempt gso_size clamping if segments exceed mtu Date: Thu, 25 Aug 2016 12:05:33 +0300 Message-ID: <20160825120533.352bbd1b@pixies> References: <1471867570-1406-1-git-send-email-shmulik.ladkani@gmail.com> <20160822125842.GF6199@breakpoint.cc> Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Cc: netdev@vger.kernel.org To: Florian Westphal , "David S. Miller" , Hannes Frederic Sowa , Eric Dumazet , Herbert Xu Return-path: Received: from mail-wm0-f66.google.com ([74.125.82.66]:36018 "EHLO mail-wm0-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S933159AbcHYJFl (ORCPT ); Thu, 25 Aug 2016 05:05:41 -0400 Received: by mail-wm0-f66.google.com with SMTP id i138so6432163wmf.3 for ; Thu, 25 Aug 2016 02:05:40 -0700 (PDT) In-Reply-To: <20160822125842.GF6199@breakpoint.cc> Sender: netdev-owner@vger.kernel.org List-ID: Hi, On Mon, 22 Aug 2016 14:58:42 +0200, fw@strlen.de wrote: > Shmulik Ladkani wrote: > > There are cases where gso skbs (which originate from an ingress > > interface) have a gso_size value that exceeds the output dst mtu: > > > > - ipv4 forwarding middlebox having in/out interfaces with different mtus > > addressed by fe6cc55f3a 'net: ip, ipv6: handle gso skbs in forwarding path' > > - bridge having a tunnel member interface stacked over a device with small mtu > > addressed by b8247f095e 'net: ip_finish_output_gso: If skb_gso_network_seglen exceeds MTU, allow segmentation for local udp tunneled skbs' > > > > In both cases, such skbs are identified, then go through early software > > segmentation+fragmentation as part of ip_finish_output_gso. > > > > Another approach is to shrink the gso_size to a value suitable so > > resulting segments are smaller than dst mtu, as suggeted by Eric > > Dumazet (as part of [1]) and Florian Westphal (as part of [2]). > > > > This will void the need for software segmentation/fragmentation at > > ip_finish_output_gso, thus significantly improve throughput and lower > > cpu load. > > > > This RFC patch attempts to implement this gso_size clamping. > > > > [1] https://patchwork.ozlabs.org/patch/314327/ > > [2] https://patchwork.ozlabs.org/patch/644724/ > > > > Signed-off-by: Shmulik Ladkani > > --- > > > > Florian, in fe6cc55f you described a BUG due to gso_size decrease. > > I've tested both bridged and routed cases, but in my setups failed to > > hit the issue; Appreciate if you can provide some hints. > > Still get the BUG, I applied this patch on top of net-next. The BUG occurs when GRO occurs on the ingress, and only if GRO merges skbs into the frag_list (OTOH when segments are only placed into frags[] of a single skb, skb_segment succeeds even if gso_size was altered). This is due to an assumption that the frag_list members terminate on exact MSS boundaries (which no longer holds during gso_size clamping). The assumption is dated back to original support of 'frag_list' to skb_segment: 89319d3801 net: Add frag_list support to skb_segment This patch adds limited support for handling frag_list packets in skb_segment. The intention is to support GRO (Generic Receive Offload) packets which will be constructed by chaining normal packets using frag_list. As such we require all frag_list members terminate on exact MSS boundaries. This is checked using BUG_ON. As there should only be one producer in the kernel of such packets, namely GRO, this requirement should not be difficult to maintain. We have few alternatives for gso_size clamping: 1 Fix 'skb_segment' arithmentics to support inputs that do not match the "frag_list members terminate on exact MSS" assumption. 2 Perform gso_size clamping in 'ip_finish_output_gso' for non-GROed skbs. Other usecases will still benefit: (a) packets arriving from virtualized interfaces, e.g. tap and friends; (b) packets arriving from veth of other namespaces (packets are locally generated by TCP stack on a different netns). 3 Ditch the idea, again ;) We can persue (1), especially if there are other benefits doing so. OTOH due to the current complexity of 'skb_segment' this is bit risky. Going with (2) could be reasonable too, as it brings value for the virtualized environmnets that need gso_size clamping, while presenting minimal risk. Appreciate your opinions. Regards, Shmulik