cluster-devel.redhat.com archive mirror
 help / color / mirror / Atom feed
From: Abhijith Das <adas@redhat.com>
To: cluster-devel.redhat.com
Subject: [Cluster-devel] [RFC PATCH 0/2] dirreadahead system call
Date: Mon, 28 Jul 2014 08:52:14 -0400 (EDT)	[thread overview]
Message-ID: <193414027.14151264.1406551934098.JavaMail.zimbra@redhat.com> (raw)
In-Reply-To: <CA2BF8AB-6F61-4856-8B0E-9D954BDEB243@dilger.ca>



----- Original Message -----
> From: "Andreas Dilger" <adilger@dilger.ca>
> To: "Abhi Das" <adas@redhat.com>
> Cc: linux-kernel at vger.kernel.org, linux-fsdevel at vger.kernel.org, cluster-devel at redhat.com
> Sent: Saturday, July 26, 2014 12:27:19 AM
> Subject: Re: [RFC PATCH 0/2] dirreadahead system call
> 
> Is there a time when this doesn't get called to prefetch entries in
> readdir() order?  It isn't clear to me what benefit there is of returning
> the entries to userspace instead of just doing the statahead implicitly
> in the kernel?
> 
> The Lustre client has had what we call "statahead" for a while,
> and similar to regular file readahead it detects the sequential access
> pattern for readdir() + stat() in readdir() order (taking into account if
> ".*"
> entries are being processed or not) and starts fetching the inode
> attributes asynchronously with a worker thread.

Does this heuristic work well in practice? In the use case we were trying to
address, a Samba server is aware beforehand if it is going to stat all the
inodes in a directory.

> 
> This syscall might be more useful if userspace called readdir() to get
> the dirents and then passed the kernel the list of inode numbers
> to prefetch before starting on the stat() calls. That way, userspace
> could generate an arbitrary list of inodes (e.g. names matching a
> regexp) and the kernel doesn't need to guess if every inode is needed.

Were you thinking arbitrary inodes across the filesystem or just a subset
from a directory? Arbitrary inodes may potentially throw up locking issues.
But yeah, as Steve mentioned in a previous email,  limiting the inodes
readahead in some fashion other than a range in readdir() order is something
that we are thinking of (list of inodes based on regexps, filenames etc). We
just chose to do an offset range of the directory for a quick, early
implementation.

> 
> As it stands, this syscall doesn't help in anything other than readdir
> order (or of the directory is small enough to be handled in one
> syscall), which could be handled by the kernel internally already,
> and it may fetch a considerable number of extra inodes from
> disk if not every inode needs to be touched.

The need for this syscall came up from a specific use case - Samba. I'm told
that Windows clients like to stat every file in a directory as soon as it is
read in and this has been a slow operation.

Cheers!
--Abhi



  reply	other threads:[~2014-07-28 12:52 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-07-25 17:37 [Cluster-devel] [RFC PATCH 0/2] dirreadahead system call Abhi Das
2014-07-25 17:37 ` [Cluster-devel] [RFC PATCH 1/2] fs: Add dirreadahead syscall and VFS hooks Abhi Das
2014-07-29  8:21   ` Michael Kerrisk
2014-07-31  3:31     ` Dave Chinner
2014-07-25 17:37 ` [Cluster-devel] [RFC PATCH 2/2] gfs2: GFS2's implementation of the dir_readahead file operation Abhi Das
2014-07-26  5:27 ` [Cluster-devel] [RFC PATCH 0/2] dirreadahead system call Andreas Dilger
2014-07-28 12:52   ` Abhijith Das [this message]
2014-07-28 21:19     ` Andreas Dilger
2014-07-29  9:36       ` Steven Whitehouse
2014-07-31  4:49       ` Dave Chinner
2014-07-31 11:19         ` Andreas Dilger
2014-07-31 23:53           ` Dave Chinner
2014-08-01  2:11             ` Abhijith Das
2014-08-01  5:54             ` Andreas Dilger
2014-08-06  2:01               ` Dave Chinner
2014-10-21  5:21             ` Abhijith Das
2014-11-10  3:41               ` Abhijith Das
2014-11-10 22:23                 ` Andreas Dilger
2014-11-10 22:47                   ` Abhijith Das
2014-07-29  8:19 ` Michael Kerrisk
2014-07-31  3:18 ` NeilBrown
2014-08-01  2:21   ` Abhijith Das

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=193414027.14151264.1406551934098.JavaMail.zimbra@redhat.com \
    --to=adas@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).