From: Abhijith Das <adas@redhat.com>
To: cluster-devel.redhat.com
Subject: [Cluster-devel] [RFC] readdirplus implementations: xgetdents vs dirreadahead syscalls
Date: Mon, 28 Jul 2014 08:22:22 -0400 (EDT) [thread overview]
Message-ID: <308078610.14129388.1406550142526.JavaMail.zimbra@redhat.com> (raw)
In-Reply-To: <20140726003859.GF20518@dastard>
----- Original Message -----
> From: "Dave Chinner" <david@fromorbit.com>
> To: "Zach Brown" <zab@redhat.com>
> Cc: "Abhijith Das" <adas@redhat.com>, linux-kernel at vger.kernel.org, "linux-fsdevel" <linux-fsdevel@vger.kernel.org>,
> "cluster-devel" <cluster-devel@redhat.com>
> Sent: Friday, July 25, 2014 7:38:59 PM
> Subject: Re: [RFC] readdirplus implementations: xgetdents vs dirreadahead syscalls
>
> On Fri, Jul 25, 2014 at 10:52:57AM -0700, Zach Brown wrote:
> > On Fri, Jul 25, 2014 at 01:37:19PM -0400, Abhijith Das wrote:
> > > Hi all,
> > >
> > > The topic of a readdirplus-like syscall had come up for discussion at
> > > last year's
> > > LSF/MM collab summit. I wrote a couple of syscalls with their GFS2
> > > implementations
> > > to get at a directory's entries as well as stat() info on the individual
> > > inodes.
> > > I'm presenting these patches and some early test results on a single-node
> > > GFS2
> > > filesystem.
> > >
> > > 1. dirreadahead() - This patchset is very simple compared to the
> > > xgetdents() system
> > > call below and scales very well for large directories in GFS2.
> > > dirreadahead() is
> > > designed to be called prior to getdents+stat operations.
> >
> > Hmm. Have you tried plumbing these read-ahead calls in under the normal
> > getdents() syscalls?
>
> The issue is not directory block readahead (which some filesystems
> like XFS already have), but issuing inode readahead during the
> getdents() syscall.
>
> It's the semi-random, interleaved inode IO that is being optimised
> here (i.e. queued, ordered, issued, cached), not the directory
> blocks themselves. As such, why does this need to be done in the
> kernel? This can all be done in userspace, and even hidden within
> the readdir() or ftw/ntfw() implementations themselves so it's OS,
> kernel and filesystem independent......
>
I don't see how the sorting of the inode reads in disk block order can be
accomplished in userland without knowing the fs-specific topology. From my
observations, I've seen that the performance gain is the most when we can
order the reads such that seek times are minimized on rotational media.
I have not tested my patches against SSDs, but my guess would be that the
performance impact would be minimal, if any.
Cheers!
--Abhi
WARNING: multiple messages have this Message-ID (diff)
From: Abhijith Das <adas@redhat.com>
To: Dave Chinner <david@fromorbit.com>
Cc: linux-kernel@vger.kernel.org,
linux-fsdevel <linux-fsdevel@vger.kernel.org>,
cluster-devel <cluster-devel@redhat.com>
Subject: Re: [RFC] readdirplus implementations: xgetdents vs dirreadahead syscalls
Date: Mon, 28 Jul 2014 08:22:22 -0400 (EDT) [thread overview]
Message-ID: <308078610.14129388.1406550142526.JavaMail.zimbra@redhat.com> (raw)
In-Reply-To: <20140726003859.GF20518@dastard>
----- Original Message -----
> From: "Dave Chinner" <david@fromorbit.com>
> To: "Zach Brown" <zab@redhat.com>
> Cc: "Abhijith Das" <adas@redhat.com>, linux-kernel@vger.kernel.org, "linux-fsdevel" <linux-fsdevel@vger.kernel.org>,
> "cluster-devel" <cluster-devel@redhat.com>
> Sent: Friday, July 25, 2014 7:38:59 PM
> Subject: Re: [RFC] readdirplus implementations: xgetdents vs dirreadahead syscalls
>
> On Fri, Jul 25, 2014 at 10:52:57AM -0700, Zach Brown wrote:
> > On Fri, Jul 25, 2014 at 01:37:19PM -0400, Abhijith Das wrote:
> > > Hi all,
> > >
> > > The topic of a readdirplus-like syscall had come up for discussion at
> > > last year's
> > > LSF/MM collab summit. I wrote a couple of syscalls with their GFS2
> > > implementations
> > > to get at a directory's entries as well as stat() info on the individual
> > > inodes.
> > > I'm presenting these patches and some early test results on a single-node
> > > GFS2
> > > filesystem.
> > >
> > > 1. dirreadahead() - This patchset is very simple compared to the
> > > xgetdents() system
> > > call below and scales very well for large directories in GFS2.
> > > dirreadahead() is
> > > designed to be called prior to getdents+stat operations.
> >
> > Hmm. Have you tried plumbing these read-ahead calls in under the normal
> > getdents() syscalls?
>
> The issue is not directory block readahead (which some filesystems
> like XFS already have), but issuing inode readahead during the
> getdents() syscall.
>
> It's the semi-random, interleaved inode IO that is being optimised
> here (i.e. queued, ordered, issued, cached), not the directory
> blocks themselves. As such, why does this need to be done in the
> kernel? This can all be done in userspace, and even hidden within
> the readdir() or ftw/ntfw() implementations themselves so it's OS,
> kernel and filesystem independent......
>
I don't see how the sorting of the inode reads in disk block order can be
accomplished in userland without knowing the fs-specific topology. From my
observations, I've seen that the performance gain is the most when we can
order the reads such that seek times are minimized on rotational media.
I have not tested my patches against SSDs, but my guess would be that the
performance impact would be minimal, if any.
Cheers!
--Abhi
next prev parent reply other threads:[~2014-07-28 12:22 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <1106785262.13440918.1406308542921.JavaMail.zimbra@redhat.com>
2014-07-25 17:37 ` [Cluster-devel] [RFC] readdirplus implementations: xgetdents vs dirreadahead syscalls Abhijith Das
2014-07-25 17:37 ` Abhijith Das
2014-07-25 17:52 ` [Cluster-devel] " Zach Brown
2014-07-25 17:52 ` Zach Brown
2014-07-25 18:08 ` [Cluster-devel] " Steven Whitehouse
2014-07-25 18:08 ` Steven Whitehouse
2014-07-25 18:08 ` Steven Whitehouse
2014-07-25 18:28 ` [Cluster-devel] " Zach Brown
2014-07-25 18:28 ` Zach Brown
2014-07-25 20:02 ` Steven Whitehouse
2014-07-25 20:02 ` Steven Whitehouse
2014-07-25 20:30 ` Trond Myklebust
2014-07-25 20:30 ` Trond Myklebust
2014-07-26 0:38 ` Dave Chinner
2014-07-26 0:38 ` Dave Chinner
2014-07-28 12:22 ` Abhijith Das [this message]
2014-07-28 12:22 ` Abhijith Das
2014-07-28 14:30 ` [Cluster-devel] " Zuckerman, Boris
2014-07-28 14:30 ` Zuckerman, Boris
2014-07-28 14:30 ` Zuckerman, Boris
2014-07-31 3:25 ` [Cluster-devel] " Dave Chinner
2014-07-31 3:25 ` Dave Chinner
2014-07-28 21:21 ` [Cluster-devel] " Andreas Dilger
2014-07-28 21:21 ` Andreas Dilger
2014-07-31 3:16 ` [Cluster-devel] " Dave Chinner
2014-07-31 3:16 ` Dave Chinner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=308078610.14129388.1406550142526.JavaMail.zimbra@redhat.com \
--to=adas@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.