From: James Vanns <james.vanns@framestore.com>
To: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Linux NFS Mailing List <linux-nfs@vger.kernel.org>
Subject: Re: Where in the server code is fsinfo rtpref calculated?
Date: Wed, 15 May 2013 17:32:15 +0100 (BST) [thread overview]
Message-ID: <1706515764.20038371.1368635535864.JavaMail.root@framestore.com> (raw)
In-Reply-To: <20130515144749.GI16811@fieldses.org>
<snip>
> > I've just returned from nfsd3_proc_fsinfo() and found what I would
> > consider an odd decision - perhaps nothing better was suggested at
> > the time. It seems to me that in response to an FSINFO call the
> > reply
> > stuffs the max_block_size value in both the maximum *and*
> > preferred
> > block sizes for both read and write. A 1MB block size for a
> > preferred
> > default is a little high! If a disk is reading at 33MB/s and we
> > have just
> > a single server running 64 knfsd and each READ call is requesting
> > 1MB of
> > data then all of a sudden we have an aggregate read speed of
> > ~512k/s
>
> I lost you here.
OK, so what we're seeing is the large majority of our nr. ~700 clients
(all Linux 2.6.32 based NFS clients) issuing READ requests of 1MB in size.
After the initial MOUNT request has been granted an FSINFO call is made. The
contents of the REPLY from the server (another Linux 2.6.32 server) include
rtmax, rtpref, wtmax and wtpref all of which are set to 1MB. This 1MB appears
to come from that code/explanation I described earlier - all values are basically
getting set to whatever comes out of nfsd_get_default_max_blksize().
> > that is without network latencies. And of course we will probably
> > have 100s of
> > requests queued behind each knfsd waiting for these 512k reads to
> > finish. All of a
> > sudden our user experience is rather poor :(
>
> Note the preferred size is not a minimum--the client isn't forced to
> do
> 1MB reads if it really only wants 1 page, for example, if that's what
> you mean.
If no r/wsize has been specified on the client mount then the negotiated
values above will be used by the client for any read() by an application
exceeding that maximum. That maximum (the default 1MB) is still quite
large I reckon.
I'm not sure at which point the preferred or optimal block size will
be used by the client - because they're set as the same on the server
side, I can't tell which is being used ;)
> (I haven't actually looked at how typical clients used rt/wtpref.)
>
> --b.
>
> > Perhaps a better suggestion would be to at least expose the maximum
> > and preferred
> > block sizes (for both read and write) via a sysctl key so an
> > administrator can set
> > it to the underlying block sizes of the file system or physical
> > device?
> >
> > Perhaps the defaults should at least be a smaller multiple of the
> > page size or somewhere
> > between that and the PDU of the network layer the service is bound
> > too.
> >
> > Just my tuppence - and my maths might be flawed ;)
> >
> > Jim
> >
> > > I'm not sure what the history is behind that logic, though.
> > >
> > > --b.
> > >
> >
> > --
> > Jim Vanns
> > Senior Software Developer
> > Framestore
>
--
Jim Vanns
Senior Software Developer
Framestore
next prev parent reply other threads:[~2013-05-15 16:36 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-05-14 11:17 Where in the server code is fsinfo rtpref calculated? James Vanns
2013-05-14 22:01 ` J. Bruce Fields
2013-05-15 9:21 ` James Vanns
2013-05-15 13:42 ` James Vanns
2013-05-15 14:15 ` J. Bruce Fields
2013-05-15 14:34 ` James Vanns
2013-05-15 14:47 ` J. Bruce Fields
2013-05-15 15:20 ` Myklebust, Trond
2013-05-15 16:32 ` James Vanns [this message]
2013-05-15 17:42 ` J. Bruce Fields
2013-05-17 11:43 ` James Vanns
2013-05-17 13:56 ` J. Bruce Fields
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1706515764.20038371.1368635535864.JavaMail.root@framestore.com \
--to=james.vanns@framestore.com \
--cc=bfields@fieldses.org \
--cc=linux-nfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).