From mboxrd@z Thu Jan 1 00:00:00 1970 From: Ben Hutchings Subject: Re: RFC: MTU for serving NFS on Infiniband Date: Tue, 24 Aug 2010 23:20:41 +0100 Message-ID: <1282688441.22839.34.camel@localhost> References: <20100823080543.319143e3@nehalam> <1282672647.2302.15.camel@achroite.uk.solarflarecom.com> Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit Cc: Stephen Hemminger , linux-kernel@vger.kernel.org, netdev@vger.kernel.org, "David S. Miller" , Alexey Kuznetsov , "Pekka Savola (ipv6)" , James Morris , Hideaki YOSHIFUJI , Patrick McHardy To: Marc Aurele La France Return-path: Received: from mail.solarflare.com ([216.237.3.220]:17953 "EHLO exchange.solarflare.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755155Ab0HXWUv (ORCPT ); Tue, 24 Aug 2010 18:20:51 -0400 In-Reply-To: Sender: netdev-owner@vger.kernel.org List-ID: On Tue, 2010-08-24 at 13:49 -0600, Marc Aurele La France wrote: > On Tue, 24 Aug 2010, Ben Hutchings wrote: > > On Tue, 2010-08-24 at 09:14 -0600, Marc Aurele La France wrote: > >> On Mon, 23 Aug 2010, Stephen Hemminger wrote: > >>> On Mon, 23 Aug 2010 08:44:37 -0600 (MDT) > >>> Marc Aurele La France wrote: > >>>> In regrouping for my next tack at this, I noticed that all stack traces go > >>>> through ip_append_data(). This would be ipv6_append_data() in the IPv6 case. > >>>> A _very_ rough draft that would have ip_append_data() temporarily drop down > >>>> to a smaller fake MTU follows ... > > >>> Why doesn't NFS generate page size fragments? Does Infiniband or your > >>> device not support this? Any thing that requires higher order allocation > >>> is going to unstable under load. Let's fix the cause not the apply bandaid > >>> solution to the symptom. > > >> From what I can tell, IP fragmentation is done centrally. > > [...] > > > Stephen and I are not talking about IP fragmentation, but about the > > ability to append 'fragments' to an skb rather than putting the entire > > packet payload in a linear buffer. See > > . > > Any payload has to either fit in the MTU, or has to be broken up into > MTU-sized (or less) fragments, come hell or high water. That this is done > centrally is a good thing. Not necessarily. Offloading it to hardware, where possible, is usually a performance win. > It is the "(or less)" part that I am working towards here. The inability to allocate large linear buffers is not a good reason to generate packets smaller than the MTU. You are working around the real problem. Ben. -- Ben Hutchings, Senior Software Engineer, Solarflare Communications Not speaking for my employer; that's the marketing department's job. They asked us to note that Solarflare product names are trademarked.