From: "J.Bruce Fields" <bfields@citi.umich.edu>
To: NeilBrown <neilb@suse.de>
Cc: Olga Kornievskaia <aglo@citi.umich.edu>, NFS <linux-nfs@vger.kernel.org>
Subject: Re: Is tcp autotuning really what NFS wants?
Date: Tue, 9 Jul 2013 22:27:35 -0400 [thread overview]
Message-ID: <20130710022735.GI8281@fieldses.org> (raw)
In-Reply-To: <20130710092255.0240a36d@notabene.brown>
On Wed, Jul 10, 2013 at 09:22:55AM +1000, NeilBrown wrote:
>
> Hi,
> I just noticed this commit:
>
> commit 9660439861aa8dbd5e2b8087f33e20760c2c9afc
> Author: Olga Kornievskaia <aglo@citi.umich.edu>
> Date: Tue Oct 21 14:13:47 2008 -0400
>
> svcrpc: take advantage of tcp autotuning
>
>
> which I must confess surprised me. I wonder if the full implications of
> removing that functionality were understood.
>
> Previously nfsd would set the transmit buffer space for a connection to
> ensure there is plenty to hold all replies. Now it doesn't.
>
> nfsd refuses to accept a request if there isn't enough space in the transmit
> buffer to send a reply. This is important to ensure that each reply gets
> sent atomically without blocking and there is no risk of replies getting
> interleaved.
>
> The server starts out with a large estimate of the reply space (1M) and for
> NFSv3 and v2 it quickly adjusts this down to something realistic. For NFSv4
> it is much harder to estimate the space needed so it just assumes every
> reply will require 1M of space.
>
> This means that with NFSv4, as soon as you have enough concurrent requests
> such that 1M each reserves all of whatever window size was auto-tuned, new
> requests on that connection will be ignored.
>
> This could significantly limit the amount of parallelism that can be achieved
> for a single TCP connection (and given that the Linux client strongly prefers
> a single connection now, this could become more of an issue).
Worse, I believe it can deadlock completely if the transmit buffer
shrinks too far, and people really have run into this:
http://mid.gmane.org/<20130125185748.GC29596@fieldses.org>
Trond's suggestion looked at the time like it might work and be doable:
http://mid.gmane.org/<4FA345DA4F4AE44899BD2B03EEEC2FA91833C1D8@sacexcmbx05-prd.hq.netapp.com>
but I dropped it.
The v4-specific situation might not be hard to improve: the v4
processing decodes the whole compound at the start, so it knows the
sequence of ops before it does anything else and could compute a tighter
bound on the reply size at that point.
> I don't know if this is a real issue that needs addressing - I hit in the
> context of a server filesystem which was misbehaving and so caused this issue
> to become obvious. But in this case it is certainly the filesystem, not the
> NFS server, which is causing the problem.
Yeah it looks a real problem.
Some good test cases would be useful if we could find some.
And, yes, my screwup for merging 966043986 without solving those other
problems first. I was confused.
It does make a difference on high bandwidth-product networks (something
people have also hit). I'd rather not regress there and also would
rather not require manual tuning for something we should be able to get
right automatically.
--b.
next parent reply other threads:[~2013-07-10 2:27 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20130710092255.0240a36d@notabene.brown>
2013-07-10 2:27 ` J.Bruce Fields [this message]
2013-07-10 4:32 ` Is tcp autotuning really what NFS wants? NeilBrown
2013-07-10 19:07 ` J.Bruce Fields
2013-07-15 4:32 ` NeilBrown
2013-07-16 1:58 ` J.Bruce Fields
2013-07-16 4:00 ` NeilBrown
2013-07-16 14:24 ` J.Bruce Fields
2013-07-18 0:03 ` Ben Myers
2013-07-24 21:07 ` J.Bruce Fields
2013-07-25 1:30 ` [PATCH] NFSD/sunrpc: avoid deadlock on TCP connection due to memory pressure NeilBrown
2013-07-25 12:35 ` Jim Rees
2013-07-25 20:18 ` J.Bruce Fields
2013-07-25 20:33 ` NeilBrown
2013-07-26 14:19 ` J.Bruce Fields
2013-07-30 2:48 ` NeilBrown
2013-08-01 2:49 ` J.Bruce Fields
2013-07-10 17:33 ` Is tcp autotuning really what NFS wants? Dean
2013-07-10 17:39 ` Ben Greear
2013-07-15 4:35 ` NeilBrown
2013-07-15 23:32 ` Ben Greear
2013-07-16 4:46 ` NeilBrown
2013-07-10 19:59 ` Michael Richardson
2013-07-15 1:26 ` Jim Rees
2013-07-15 5:02 ` NeilBrown
2013-07-15 11:57 ` Jim Rees
2013-07-15 13:42 ` Jim Rees
2013-07-16 1:10 ` NeilBrown
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130710022735.GI8281@fieldses.org \
--to=bfields@citi.umich.edu \
--cc=aglo@citi.umich.edu \
--cc=linux-nfs@vger.kernel.org \
--cc=neilb@suse.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.