linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "J.Bruce Fields" <bfields@citi.umich.edu>
To: NeilBrown <neilb@suse.de>
Cc: Olga Kornievskaia <aglo@citi.umich.edu>, NFS <linux-nfs@vger.kernel.org>
Subject: Re: Is tcp autotuning really what NFS wants?
Date: Tue, 9 Jul 2013 22:27:35 -0400	[thread overview]
Message-ID: <20130710022735.GI8281@fieldses.org> (raw)
In-Reply-To: <20130710092255.0240a36d@notabene.brown>

On Wed, Jul 10, 2013 at 09:22:55AM +1000, NeilBrown wrote:
> 
> Hi,
>  I just noticed this commit:
> 
> commit 9660439861aa8dbd5e2b8087f33e20760c2c9afc
> Author: Olga Kornievskaia <aglo@citi.umich.edu>
> Date:   Tue Oct 21 14:13:47 2008 -0400
> 
>     svcrpc: take advantage of tcp autotuning
> 
> 
> which I must confess surprised me.  I wonder if the full implications of
> removing that functionality were understood.
> 
> Previously nfsd would set the transmit buffer space for a connection to
> ensure there is plenty to hold all replies.  Now it doesn't.
> 
> nfsd refuses to accept a request if there isn't enough space in the transmit
> buffer to send a reply.  This is important to ensure that each reply gets
> sent atomically without blocking and there is no risk of replies getting
> interleaved.
> 
> The server starts out with a large estimate of the reply space (1M) and for
> NFSv3 and v2 it quickly adjusts this down to something realistic.  For NFSv4
> it is much harder to estimate the space needed so it just assumes every
> reply will require 1M of space.
> 
> This means that with NFSv4, as soon as you have enough concurrent requests
> such that 1M each reserves all of whatever window size was auto-tuned, new
> requests on that connection will be ignored.
>
> This could significantly limit the amount of parallelism that can be achieved
> for a single TCP connection (and given that the Linux client strongly prefers
> a single connection now, this could become more of an issue).

Worse, I believe it can deadlock completely if the transmit buffer
shrinks too far, and people really have run into this:

	http://mid.gmane.org/<20130125185748.GC29596@fieldses.org>

Trond's suggestion looked at the time like it might work and be doable:

	http://mid.gmane.org/<4FA345DA4F4AE44899BD2B03EEEC2FA91833C1D8@sacexcmbx05-prd.hq.netapp.com>

but I dropped it.

The v4-specific situation might not be hard to improve: the v4
processing decodes the whole compound at the start, so it knows the
sequence of ops before it does anything else and could compute a tighter
bound on the reply size at that point.

> I don't know if this is a real issue that needs addressing - I hit in the
> context of a server filesystem which was misbehaving and so caused this issue
> to become obvious.  But in this case it is certainly the filesystem, not the
> NFS server, which is causing the problem.

Yeah it looks a real problem.

Some good test cases would be useful if we could find some.

And, yes, my screwup for merging 966043986 without solving those other
problems first.  I was confused.

It does make a difference on high bandwidth-product networks (something
people have also hit).  I'd rather not regress there and also would
rather not require manual tuning for something we should be able to get
right automatically.

--b.

       reply	other threads:[~2013-07-10  2:27 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20130710092255.0240a36d@notabene.brown>
2013-07-10  2:27 ` J.Bruce Fields [this message]
2013-07-10  4:32   ` Is tcp autotuning really what NFS wants? NeilBrown
2013-07-10 19:07     ` J.Bruce Fields
2013-07-15  4:32       ` NeilBrown
2013-07-16  1:58         ` J.Bruce Fields
2013-07-16  4:00           ` NeilBrown
2013-07-16 14:24             ` J.Bruce Fields
2013-07-18  0:03               ` Ben Myers
2013-07-24 21:07                 ` J.Bruce Fields
2013-07-25  1:30                   ` [PATCH] NFSD/sunrpc: avoid deadlock on TCP connection due to memory pressure NeilBrown
2013-07-25 12:35                     ` Jim Rees
2013-07-25 20:18                     ` J.Bruce Fields
2013-07-25 20:33                       ` NeilBrown
2013-07-26 14:19                         ` J.Bruce Fields
2013-07-30  2:48                           ` NeilBrown
2013-08-01  2:49                             ` J.Bruce Fields
2013-07-10 17:33   ` Is tcp autotuning really what NFS wants? Dean
2013-07-10 17:39     ` Ben Greear
2013-07-15  4:35       ` NeilBrown
2013-07-15 23:32         ` Ben Greear
2013-07-16  4:46           ` NeilBrown
2013-07-10 19:59     ` Michael Richardson
2013-07-15  1:26   ` Jim Rees
2013-07-15  5:02     ` NeilBrown
2013-07-15 11:57       ` Jim Rees
2013-07-15 13:42   ` Jim Rees
2013-07-16  1:10     ` NeilBrown

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130710022735.GI8281@fieldses.org \
    --to=bfields@citi.umich.edu \
    --cc=aglo@citi.umich.edu \
    --cc=linux-nfs@vger.kernel.org \
    --cc=neilb@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).