From: "J.Bruce Fields" <bfields@citi.umich.edu>
To: NeilBrown <neilb@suse.de>
Cc: Olga Kornievskaia <aglo@citi.umich.edu>, NFS <linux-nfs@vger.kernel.org>
Subject: Re: Is tcp autotuning really what NFS wants?
Date: Tue, 9 Jul 2013 22:27:35 -0400 [thread overview]
Message-ID: <20130710022735.GI8281@fieldses.org> (raw)
In-Reply-To: <20130710092255.0240a36d@notabene.brown>
On Wed, Jul 10, 2013 at 09:22:55AM +1000, NeilBrown wrote:
>
> Hi,
> I just noticed this commit:
>
> commit 9660439861aa8dbd5e2b8087f33e20760c2c9afc
> Author: Olga Kornievskaia <aglo@citi.umich.edu>
> Date: Tue Oct 21 14:13:47 2008 -0400
>
> svcrpc: take advantage of tcp autotuning
>
>
> which I must confess surprised me. I wonder if the full implications of
> removing that functionality were understood.
>
> Previously nfsd would set the transmit buffer space for a connection to
> ensure there is plenty to hold all replies. Now it doesn't.
>
> nfsd refuses to accept a request if there isn't enough space in the transmit
> buffer to send a reply. This is important to ensure that each reply gets
> sent atomically without blocking and there is no risk of replies getting
> interleaved.
>
> The server starts out with a large estimate of the reply space (1M) and for
> NFSv3 and v2 it quickly adjusts this down to something realistic. For NFSv4
> it is much harder to estimate the space needed so it just assumes every
> reply will require 1M of space.
>
> This means that with NFSv4, as soon as you have enough concurrent requests
> such that 1M each reserves all of whatever window size was auto-tuned, new
> requests on that connection will be ignored.
>
> This could significantly limit the amount of parallelism that can be achieved
> for a single TCP connection (and given that the Linux client strongly prefers
> a single connection now, this could become more of an issue).
Worse, I believe it can deadlock completely if the transmit buffer
shrinks too far, and people really have run into this:
http://mid.gmane.org/<20130125185748.GC29596@fieldses.org>
Trond's suggestion looked at the time like it might work and be doable:
http://mid.gmane.org/<4FA345DA4F4AE44899BD2B03EEEC2FA91833C1D8@sacexcmbx05-prd.hq.netapp.com>
but I dropped it.
The v4-specific situation might not be hard to improve: the v4
processing decodes the whole compound at the start, so it knows the
sequence of ops before it does anything else and could compute a tighter
bound on the reply size at that point.
> I don't know if this is a real issue that needs addressing - I hit in the
> context of a server filesystem which was misbehaving and so caused this issue
> to become obvious. But in this case it is certainly the filesystem, not the
> NFS server, which is causing the problem.
Yeah it looks a real problem.
Some good test cases would be useful if we could find some.
And, yes, my screwup for merging 966043986 without solving those other
problems first. I was confused.
It does make a difference on high bandwidth-product networks (something
people have also hit). I'd rather not regress there and also would
rather not require manual tuning for something we should be able to get
right automatically.
--b.
next parent reply other threads:[~2013-07-10 2:27 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20130710092255.0240a36d@notabene.brown>
2013-07-10 2:27 ` J.Bruce Fields [this message]
2013-07-10 4:32 ` Is tcp autotuning really what NFS wants? NeilBrown
2013-07-10 19:07 ` J.Bruce Fields
2013-07-15 4:32 ` NeilBrown
2013-07-16 1:58 ` J.Bruce Fields
2013-07-16 4:00 ` NeilBrown
2013-07-16 14:24 ` J.Bruce Fields
2013-07-18 0:03 ` Ben Myers
2013-07-24 21:07 ` J.Bruce Fields
2013-07-25 1:30 ` [PATCH] NFSD/sunrpc: avoid deadlock on TCP connection due to memory pressure NeilBrown
2013-07-25 12:35 ` Jim Rees
2013-07-25 20:18 ` J.Bruce Fields
2013-07-25 20:33 ` NeilBrown
2013-07-26 14:19 ` J.Bruce Fields
2013-07-30 2:48 ` NeilBrown
2013-08-01 2:49 ` J.Bruce Fields
2013-07-10 17:33 ` Is tcp autotuning really what NFS wants? Dean
2013-07-10 17:39 ` Ben Greear
2013-07-15 4:35 ` NeilBrown
2013-07-15 23:32 ` Ben Greear
2013-07-16 4:46 ` NeilBrown
2013-07-10 19:59 ` Michael Richardson
2013-07-15 1:26 ` Jim Rees
2013-07-15 5:02 ` NeilBrown
2013-07-15 11:57 ` Jim Rees
2013-07-15 13:42 ` Jim Rees
2013-07-16 1:10 ` NeilBrown
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130710022735.GI8281@fieldses.org \
--to=bfields@citi.umich.edu \
--cc=aglo@citi.umich.edu \
--cc=linux-nfs@vger.kernel.org \
--cc=neilb@suse.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).