All of lore.kernel.org
 help / color / mirror / Atom feed
From: James Vanns <james.vanns@framestore.com>
To: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Linux NFS Mailing List <linux-nfs@vger.kernel.org>
Subject: Re: Where in the server code is fsinfo rtpref calculated?
Date: Wed, 15 May 2013 15:34:27 +0100 (BST)	[thread overview]
Message-ID: <1995711958.19982333.1368628467473.JavaMail.root@framestore.com> (raw)
In-Reply-To: <20130515141508.GH16811@fieldses.org>


> On Wed, May 15, 2013 at 02:42:42PM +0100, James Vanns wrote:
> > > fs/nfsd/nfssvc.c:nfsd_get_default_maxblksize() is probably a good
> > > starting point.  Its caller, nfsd_create_serv(), calls
> > > svc_create_pooled() with the result that's calculated.
> > 
> > Hmm. If I've read this section of code correctly, it seems to me
> > that on most modern NFS servers (using TCP as the transport) the
> > default
> > and preferred blocksize negotiated with clients will almost always
> > be
> > 1MB - the maximum RPC payload. The nfsd_get_default_maxblksize()
> > function
> > seems obsolete for modern 64-bit servers with at least 4G of RAM as
> > it'll
> > always prefer this upper bound instead of any value calculated
> > according to
> > available RAM.
> 
> Well, "obsolete" is an odd way to put it--the code is still expected
> to work on smaller machines.

Poor choice of words perhaps. I guess I'm just used to NFS servers being
pretty hefty pieces of kit and 'small' workstations having a couple of GB
of RAM too.

> Arguments welcome about the defaults, thoodd ugh I wonder whether it
> would be better to be doing this sort of calculation in user space.

See below.

> > For what it's worth (not sure if I specified this) I'm running
> > kernel 2.6.32.
> > 
> > Anyway, this file/function appears to set the default *max*
> > blocksize. I haven't
> > read all the related code yet, but does the preferred block size
> > derive
> > from this maximum too?
> 
> See
> > > For finfo see fs/nfsd/nfs3proc.c:nfsd3_proc_fsinfo, which uses
> > > svc_max_payload().

I've just returned from nfsd3_proc_fsinfo() and found what I would
consider an odd decision - perhaps nothing better was suggested at
the time. It seems to me that in response to an FSINFO call the reply
stuffs the max_block_size value in  both the maximum *and* preferred
block sizes for both read and write. A 1MB block size for a preferred
default is a little high! If a disk is reading at 33MB/s and we have just
a single server running 64 knfsd and each READ call is requesting 1MB of
data then all of a sudden we have an aggregate read speed of ~512k/s and 
that is without network latencies. And of course we will probably have 100s of
requests queued behind each knfsd waiting for these 512k reads to finish. All of a
sudden our user experience is rather poor :(

Perhaps a better suggestion would be to at least expose the maximum and preferred
block sizes (for both read and write) via a sysctl key so an administrator can set
it to the underlying block sizes of the file system or physical device?

Perhaps the defaults should at least be a smaller multiple of the page size or somewhere
between that and the PDU of the network layer the service is bound too.

Just my tuppence - and my maths might be flawed ;)

Jim

> I'm not sure what the history is behind that logic, though.
> 
> --b.
> 

-- 
Jim Vanns
Senior Software Developer
Framestore

  reply	other threads:[~2013-05-15 14:38 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-05-14 11:17 Where in the server code is fsinfo rtpref calculated? James Vanns
2013-05-14 22:01 ` J. Bruce Fields
2013-05-15  9:21   ` James Vanns
2013-05-15 13:42   ` James Vanns
2013-05-15 14:15     ` J. Bruce Fields
2013-05-15 14:34       ` James Vanns [this message]
2013-05-15 14:47         ` J. Bruce Fields
2013-05-15 15:20           ` Myklebust, Trond
2013-05-15 16:32           ` James Vanns
2013-05-15 17:42             ` J. Bruce Fields
2013-05-17 11:43               ` James Vanns
2013-05-17 13:56                 ` J. Bruce Fields

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1995711958.19982333.1368628467473.JavaMail.root@framestore.com \
    --to=james.vanns@framestore.com \
    --cc=bfields@fieldses.org \
    --cc=linux-nfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.