From mboxrd@z Thu Jan 1 00:00:00 1970 From: Shmulik Ladkani Subject: Re: [RFC PATCH] net: ip_finish_output_gso: Attempt gso_size clamping if segments exceed mtu Date: Fri, 9 Sep 2016 08:48:34 +0300 Message-ID: <20160909084834.7067784c@halley> References: <1471867570-1406-1-git-send-email-shmulik.ladkani@gmail.com> <20160822125842.GF6199@breakpoint.cc> <20160825120533.352bbd1b@pixies> Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Cc: Florian Westphal , "David S. Miller" , Hannes Frederic Sowa , Eric Dumazet , Herbert Xu , Alexander Duyck To: netdev@vger.kernel.org Return-path: Received: from mail-wm0-f53.google.com ([74.125.82.53]:38418 "EHLO mail-wm0-f53.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750744AbcIIFss (ORCPT ); Fri, 9 Sep 2016 01:48:48 -0400 Received: by mail-wm0-f53.google.com with SMTP id 1so13060405wmz.1 for ; Thu, 08 Sep 2016 22:48:47 -0700 (PDT) In-Reply-To: <20160825120533.352bbd1b@pixies> Sender: netdev-owner@vger.kernel.org List-ID: On Thu, 25 Aug 2016 12:05:33 +0300 Shmulik Ladkani wrote: > The BUG occurs when GRO occurs on the ingress, and only if GRO merges > skbs into the frag_list (OTOH when segments are only placed into frags[] > of a single skb, skb_segment succeeds even if gso_size was altered). > > This is due to an assumption that the frag_list members terminate on > exact MSS boundaries (which no longer holds during gso_size clamping). > > We have few alternatives for gso_size clamping: > > 1 Fix 'skb_segment' arithmentics to support inputs that do not match > the "frag_list members terminate on exact MSS" assumption. > > 2 Perform gso_size clamping in 'ip_finish_output_gso' for non-GROed skbs. > Other usecases will still benefit: (a) packets arriving from > virtualized interfaces, e.g. tap and friends; (b) packets arriving from > veth of other namespaces (packets are locally generated by TCP stack > on a different netns). > > 3 Ditch the idea, again ;) > > We can persue (1), especially if there are other benefits doing so. > OTOH due to the current complexity of 'skb_segment' this is bit risky. > > Going with (2) could be reasonable too, as it brings value for > the virtualized environmnets that need gso_size clamping, while > presenting minimal risk. Summarizing actions taken, in case someone refers to this thread. - Re (1): Spent a short while massaging skb_segment(). Code is not prepared to support various gso_size inputs. Main issue is that if nskb's frags[] get exausted (but original frag_skb's frags[] not yet fully traversed), there's no generation of a new skb. Code expects interation of both nskb's frags[] and frag_skb's frags[] to terminate together; the following allocated new skb is always a clone of next frag_skb in the original head_skb. Supporting various gso_size inputs required an intrusive rewrite. - Re (2): There's no easy way for ip_finish_output_gso() to detect that the skb is safe for "gso_size clamping" while preserving GSO/GRO transparency: We can know it is "gso_size clamping safe" PER SKB, but it doesn't suffice; to preserve GRO transparecy rule, we must know skb arrived from a code flow that is ALWAYS safe for gso_size clamping. So I ended up identifying the relevant code-flow of the use-case I'm interested on, verified it is indeed safe for altering gso_size (while taking a slight risk that this might not hold true in the future). I've used that mark as the criteria for safe "gso_size clamping" in 'ip_finish_output_gso'. Yep, not too elegant. Regards, Shmulik