[Lustre-devel] Export over NFS sets rsize to 1MB?

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Dilger, Andreas <andreas.dilger@intel.com>
To: lustre-devel@lists.lustre.org
Subject: [Lustre-devel] Export over NFS sets rsize to 1MB?
Date: Mon, 13 May 2013 21:58:44 +0000	[thread overview]
Message-ID: <CDB6B2AB.32164%andreas.dilger@intel.com> (raw)
In-Reply-To: <1975652007.19339235.1368451178426.JavaMail.root@framestore.com>

On 2013/13/05 7:19 AM, "James Vanns" <james.vanns@framestore.com> wrote:
>Hello dev list. Apologies for a post to perhaps the wrong group but I'm
>having a
>bit of difficulty locating any document or wiki describing how and/or
>where the
>preferred read and write block size for NFS exports of a Lustre
>filesystem are
>set to 1MB?

1MB is the RPC size and "optimal IO size" for Lustre.  This would normally
be exported to applications via the stat(2) "st_blksize" field, though it
is typically 2MB (2x the RPC size in order to allow some pipelining).  I
suspect this is where NFS is getting the value, since it is not passed up
via the statfs(2) call.

>Basically we have two Lustre filesystems exported over NFSv3. Our lustre
>block size
>is 4k and the max r/w size is 1MB. Without any special rsize/wsize
>options set for
>the export the default one suggested to clients (MOUNT->FSINFO RPC) as
>the preferred
>size is set to 1MB. How does Lustre figure this out? Other non-Lustre
>exports are generally much less; 4, 8, 16 or 32 kilobytes.

Taking a quick look at the code, it looks like NFS TCP connections all
have a maximum max_payload of 1MB, but this is limited in a number of
places in the code by the actual read size, and other maxima (for which I
can't easily find the source value).

>Any hints would be appreciated. Documentation or code paths welcome as
>are annotated /proc locations.

To clarify from your question - is this large blocksize causing a
performance problem?  I recall some applications having problems with
stdio "fread()" and friends reading too much data into their buffers if
they are doing random IO.  Ideally stdio shouldn't be reading more than it
needs when doing random IO.

At one time in the past, we derived the st_blksize from the file
stripe_size, but this caused problems with the NFS "Connectathon" or
similar.  It is currently limited by LL_MAX_BLKSIZE_BITS for all files,
but I wouldn't recommend reducing this directly, since it would also
affect "cp" and others that also depend on st_blksize for the "optimal IO
size".  It would be possible to reintroduce the per-file tunable in
ll_update_inode() I think.

Cheers, Andreas
-- 
Andreas Dilger

Lustre Software Architect
Intel High Performance Data Division

next prev parent reply	other threads:[~2013-05-13 21:58 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <1212911910.19338063.1368451002112.JavaMail.root@framestore.com>
2013-05-13 13:19 ` [Lustre-devel] Export over NFS sets rsize to 1MB? James Vanns
2013-05-13 21:58   ` Dilger, Andreas [this message]
2013-05-14 15:07     ` James Vanns
2013-05-14 22:06       ` Dilger, Andreas
2013-05-14 22:06         ` Dilger, Andreas
2013-05-15  9:23         ` James Vanns
2013-05-15  9:23           ` James Vanns

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CDB6B2AB.32164%andreas.dilger@intel.com \
    --to=andreas.dilger@intel.com \
    --cc=lustre-devel@lists.lustre.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.