linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Trond Myklebust <trond.myklebust@primarydata.com>
To: Scott Mayhew <smayhew@redhat.com>
Cc: NeilBrown <neilb@suse.de>,
	Anna Schumaker <anna.schumaker@netapp.com>,
	linux-nfs@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH] NFS: report more appropriate block size for directories.
Date: Wed, 13 May 2015 14:55:41 -0400	[thread overview]
Message-ID: <1431543341.9235.10.camel@primarydata.com> (raw)
In-Reply-To: <20150508151459.GA63078@tonberry.usersys.redhat.com>

On Fri, 2015-05-08 at 11:14 -0400, Scott Mayhew wrote:
> On Fri, 08 May 2015, NeilBrown wrote:
> 
> > 
> > In glibc 2.21 (and several previous), a call to opendir() will
> > result in a 32K (BUFSIZ*4) buffer being allocated and passed to
> > getdents.
> > 
> > However a call to fdopendir() results in an 'fstat' request to
> > determine block size and a matching buffer allocated for subsequent
> > use with getdents.  This will typically be 1M.
> > 
> > The first getdents call on an NFS directory will always use
> > READDIR_PLUS (or NFSv4 equivalent) if available.  Subsequent getdents
> > calls only use this more expensive version if some 'stat' requests are
> > made between the getdents calls.
> > 
> > For this reason it is good to keep at least that first getdents call
> > relatively short.  When fdopendir() and readdir() is used on a large
> > directory, it takes approximately 32 times as long to complete as
> > using "opendir".  Current versions of 'find' use fdopendir() and
> > demonstrate this slowness.
> > 
> > 'stat' on a directory currently returns the 'wsize'.  This number has
> > no meaning on directories.
> > Actual READDIR requests are limited to ->dtsize, which itself is
> > capped at 4 pages, coincidently the same as BUFSIZ*4.
> > So this is a meaningful number to use as the blocksize on directories,
> > and has the effect of making 'find' on large directories go a lot
> > faster.
> 
> Would it make sense to do something similar for regular files too?
> fopen() does a similar buffer allocation unless the application
> overrides the buffer size via setbuffer()/setvbuf().  That can then
> result in fseek() reading a lot of unnecessary data over the wire.
> 
> Prior to commit ba52de1 (inode-diet: Eliminate i_blksize from the inode
> structure), a stat() over nfs would return the page size in st_blksize,
> and for some workloads it does make a difference.  For instance, I have
> a customer running gdb in an diskless environment.  On a stock kernel
> where a stat() over nfs returns the wsize in st_blksize, their job takes
> ~19 minutes... on a test kernel where a stat() over nfs returns the page
> size instead, that same job takes ~13 minutes.  I hadn't sent a patch
> yet because I'm still trying to account for a few extra minutes of
> run time elsewhere...
> 

The client shouldn't be reporting anything different after commit
ba52de1. We should have
  inode->i_blkbits = sb->s_blocksize_bits;

with
  sb->s_blocksize_bits being set as log2(sb->s_blocksize)

Previously, inode->i_blksize was the same as sb_s_blocksize.

-- 
Trond Myklebust
Linux NFS client maintainer, PrimaryData
trond.myklebust@primarydata.com



      reply	other threads:[~2015-05-13 18:55 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-05-08  3:10 [PATCH] NFS: report more appropriate block size for directories NeilBrown
2015-05-08 15:14 ` Scott Mayhew
2015-05-13 18:55   ` Trond Myklebust [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1431543341.9235.10.camel@primarydata.com \
    --to=trond.myklebust@primarydata.com \
    --cc=anna.schumaker@netapp.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=neilb@suse.de \
    --cc=smayhew@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).