From: Stephen Hemminger <shemminger@vyatta.com>
To: Marc Aurele La France <tsi@ualberta.ca>
Cc: linux-kernel@vger.kernel.org, netdev@vger.kernel.org
Subject: Re: RFC: MTU for serving NFS on Infiniband
Date: Mon, 23 Aug 2010 08:05:43 -0700 [thread overview]
Message-ID: <20100823080543.319143e3@nehalam> (raw)
In-Reply-To: <alpine.LNX.2.00.1008230842290.9325@abcyxhiz.aict.ualberta.ca>
On Mon, 23 Aug 2010 08:44:37 -0600 (MDT)
Marc Aurele La France <tsi@ualberta.ca> wrote:
> My apologies for the multiple post. I got bit the first time around by my
> MUA's configuration.
>
> ----
>
> Greetings.
>
> For some time now, the kernel and I have been having an argument over what
> the MTU should be for serving NFS over Infiniband. I say 65520, the
> documented maximum for connected mode. But, so far, I've been unable to have
> anything over 32192 remain stable.
>
> Back in the 2.6.14 -> .15 period, sunrpc's sk_buff allocations were changed
> from GFP_KERNEL to GFP_ATOMIC (b079fa7baa86b47579f3f60f86d03d21c76159b8
> mainstream commit). Understandably, this was to prevent recursion through
> the NFS and sunrpc code. This is fine for the most common MTU out there, as
> the kernel is almost certain to find a free page. But, as one increases the
> MTU, memory fragmentation starts to play a role in nixing these allocations.
>
> These allocation failures ultimately result in sparse files being written
> through NFS. Granted, many of my users' application are oblivious to
> this because they don't check for such errors. But it would be nice if the
> kernel were more resilient in this regard.
>
> For a few months now, I've been running with sunrpc sk_buff allocations using
> GFP_NOFS instead, which allows for dirty data to be flushed out and still
> avoids recursion through sunrpc. With this, I've been able to increase the
> stable MTU to 32192. But no further, as eventually there is no dirty data
> left and memory fragmentation becomes mostly due to yet-to-be-sync'ed
> filesystem data. There's also the matter that using GFP_NOFS for this can
> slow down NFS quite a bit.
>
> In regrouping for my next tack at this, I noticed that all stack traces go
> through ip_append_data(). This would be ipv6_append_data() in the IPv6 case.
> A _very_ rough draft that would have ip_append_data() temporarily drop down
> to a smaller fake MTU follows ...
Why doesn't NFS generate page size fragments? Does Infiniband or your
device not support this? Any thing that requires higher order allocation
is going to unstable under load. Let's fix the cause not the apply bandaid
solution to the symptom.
next prev parent reply other threads:[~2010-08-23 15:05 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-08-23 14:44 RFC: MTU for serving NFS on Infiniband Marc Aurele La France
2010-08-23 15:05 ` Stephen Hemminger [this message]
2010-08-24 15:14 ` Marc Aurele La France
2010-08-24 17:57 ` Ben Hutchings
2010-08-24 19:49 ` Marc Aurele La France
2010-08-24 20:09 ` Eric Dumazet
2010-08-24 20:33 ` Marc Aurele La France
2010-08-24 22:20 ` Ben Hutchings
2010-08-24 22:39 ` Stephen Hemminger
2010-08-25 5:54 ` Eric Dumazet
2010-08-25 12:10 ` Alexey Kuznetsov
2010-08-25 12:17 ` Eric Dumazet
2010-08-26 11:40 ` Marc Aurele La France
2010-08-26 11:57 ` Eric Dumazet
2010-08-26 14:43 ` Marc Aurele La France
2010-08-26 23:53 ` Stephen Hemminger
2010-08-27 0:06 ` David Miller
2010-08-27 16:20 ` Roland Dreier
2010-08-27 17:16 ` Roland Dreier
2010-08-27 17:53 ` Marc Aurele La France
2010-08-26 14:58 ` Chuck Lever
2010-09-30 18:50 ` Marc Aurele La France
2010-08-23 15:12 ` Ben Hutchings
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20100823080543.319143e3@nehalam \
--to=shemminger@vyatta.com \
--cc=linux-kernel@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=tsi@ualberta.ca \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox