linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Andreas Dilger <adilger@dilger.ca>
Cc: Zach Brown <zab@redhat.com>, Abhijith Das <adas@redhat.com>,
	linux-kernel@vger.kernel.org,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	cluster-devel <cluster-devel@redhat.com>
Subject: Re: [RFC] readdirplus implementations: xgetdents vs dirreadahead syscalls
Date: Thu, 31 Jul 2014 13:16:01 +1000	[thread overview]
Message-ID: <20140731031601.GO26465@dastard> (raw)
In-Reply-To: <4356C960-C548-42AC-876E-106A1DAA85EE@dilger.ca>

On Mon, Jul 28, 2014 at 03:21:20PM -0600, Andreas Dilger wrote:
> On Jul 25, 2014, at 6:38 PM, Dave Chinner <david@fromorbit.com> wrote:
> > On Fri, Jul 25, 2014 at 10:52:57AM -0700, Zach Brown wrote:
> >> On Fri, Jul 25, 2014 at 01:37:19PM -0400, Abhijith Das wrote:
> >>> Hi all,
> >>> 
> >>> The topic of a readdirplus-like syscall had come up for discussion at last year's
> >>> LSF/MM collab summit. I wrote a couple of syscalls with their GFS2 implementations
> >>> to get at a directory's entries as well as stat() info on the individual inodes.
> >>> I'm presenting these patches and some early test results on a single-node GFS2
> >>> filesystem.
> >>> 
> >>> 1. dirreadahead() - This patchset is very simple compared to the xgetdents() system
> >>> call below and scales very well for large directories in GFS2. dirreadahead() is
> >>> designed to be called prior to getdents+stat operations.
> >> 
> >> Hmm.  Have you tried plumbing these read-ahead calls in under the normal
> >> getdents() syscalls?
> > 
> > The issue is not directory block readahead (which some filesystems
> > like XFS already have), but issuing inode readahead during the
> > getdents() syscall.
> > 
> > It's the semi-random, interleaved inode IO that is being optimised
> > here (i.e. queued, ordered, issued, cached), not the directory
> > blocks themselves.
> 
> Sure.
> 
> > As such, why does this need to be done in the
> > kernel?  This can all be done in userspace, and even hidden within
> > the readdir() or ftw/ntfw() implementations themselves so it's OS,
> > kernel and filesystem independent......
> 
> That assumes sorting by inode number maps to sorting by disk order.
> That isn't always true.

That's true, but it's a fair bet that roughly ascending inode number
ordering is going to be better than random ordering for most
filesystems.

Besides, ordering isn't the real problem - the real problem is the
latency caused by having to do the inode IO synchronously one stat()
at a time. Just multithread the damn thing in userspace so the
stat()s can be done asynchronously and hence be more optimally
ordered by the IO scheduler and completed before the application
blocks on the IO.

It doesn't even need completion synchronisation - the stat()
issued by the application will block until the async stat()
completes the process of bringing the inode into the kernel cache...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

      reply	other threads:[~2014-07-31  3:16 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <1106785262.13440918.1406308542921.JavaMail.zimbra@redhat.com>
2014-07-25 17:37 ` [RFC] readdirplus implementations: xgetdents vs dirreadahead syscalls Abhijith Das
2014-07-25 17:52   ` Zach Brown
2014-07-25 18:08     ` Steven Whitehouse
2014-07-25 18:28       ` [Cluster-devel] " Zach Brown
2014-07-25 20:02         ` Steven Whitehouse
2014-07-25 20:30           ` Trond Myklebust
2014-07-26  0:38     ` Dave Chinner
2014-07-28 12:22       ` Abhijith Das
2014-07-28 14:30         ` Zuckerman, Boris
2014-07-31  3:25         ` Dave Chinner
2014-07-28 21:21       ` Andreas Dilger
2014-07-31  3:16         ` Dave Chinner [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140731031601.GO26465@dastard \
    --to=david@fromorbit.com \
    --cc=adas@redhat.com \
    --cc=adilger@dilger.ca \
    --cc=cluster-devel@redhat.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=zab@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).