linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Andreas Dilger <adilger@clusterfs.com>
To: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: Sage Weil <sage@newdream.net>,
	Christoph Hellwig <hch@infradead.org>,
	Brad Boyer <flar@allandria.com>,
	Anton Altaparmakov <aia21@cam.ac.uk>,
	Gary Grider <ggrider@lanl.gov>,
	linux-fsdevel@vger.kernel.org
Subject: Re: readdirplus() as possible POSIX I/O API
Date: Tue, 5 Dec 2006 03:26:44 -0700	[thread overview]
Message-ID: <20061205102644.GH5937@schatzie.adilger.int> (raw)
In-Reply-To: <1165245336.711.176.camel@lade.trondhjem.org>

On Dec 04, 2006  10:15 -0500, Trond Myklebust wrote:
> I propose that we implement this sort of thing in the kernel via a readdir
> equivalent to posix_fadvise(). That can give exactly the barrier
> semantics that they are asking for, and only costs 1 extra syscall as
> opposed to 2 (opendirplus() and readdirplus()).

I think the "barrier semantics" are something that have just crept
into this discussion and is confusing the issue.

The primary goal (IMHO) of this syscall is to allow the filesystem
(primarily distributed cluster filesystems, but HFS and NTFS developers
seem on board with this too) to avoid tens to thousands of stat RPCs in
very common ls -R, find, etc. kind of operations.

I can't see how fadvise() could help this case?  Yes, it would tell the
filesystem that it could do readahead of the readdir() data, but the
app will still be doing stat() on each of the thousands of files in the
directory, instantiating inodes and dentries on that node (which need
locking, and potentially immediate lock revocation if the files are
being written to by other nodes).  In some cases (e.g. rm -r, grep -r)
that might even be a win, because the client will soon be touching all
of those files, but not necessarily in the ls -lR, find cases.

The filesystem can't always do "stat-ahead" on the files because that
requires instantiating an inode on the client which may be stale (lock
revoked) by the time the app gets to it, and the app (and the VFS)  have
no idea just how stale it is, and whether the stat is a "real" stat or
"only" the readdir stat (because the fadvise would only be useful on
the directory, and not all of the child entries), so it would need to
re-stat the file.  Also, this would potentially blow the client's real
working set of inodes out of cache.

Doing things en-masse with readdirplus() also allows the filesystem to
do the stat() operations in parallel internally (which is a net win if
there are many servers involved) instead of serially as the application
would do.

Cheers, Andreas

PS - I changed the topic to separate this from the openfh() thread.
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.


  parent reply	other threads:[~2006-12-05 10:26 UTC|newest]

Thread overview: 124+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-11-28  4:34 NFSv4/pNFS possible POSIX I/O API standards Gary Grider
2006-11-28  5:54 ` Christoph Hellwig
2006-11-28 10:54   ` Andreas Dilger
2006-11-28 11:28     ` Anton Altaparmakov
2006-11-28 20:17     ` Russell Cattelan
2006-11-28 23:28     ` Wendy Cheng
2006-11-29  9:12       ` Christoph Hellwig
2006-11-29  9:04   ` Christoph Hellwig
2006-11-29  9:14     ` Christoph Hellwig
2006-11-29  9:48     ` Andreas Dilger
2006-11-29 10:18       ` Anton Altaparmakov
2006-11-29  8:26         ` Brad Boyer
2006-11-30  9:25           ` Christoph Hellwig
2006-11-30 17:49             ` Sage Weil
2006-12-01  5:26               ` Trond Myklebust
2006-12-01  7:08                 ` Sage Weil
2006-12-01 14:41                   ` Trond Myklebust
2006-12-01 16:47                     ` Sage Weil
2006-12-01 18:07                       ` Trond Myklebust
2006-12-01 18:42                         ` Sage Weil
2006-12-01 19:13                           ` Trond Myklebust
2006-12-01 20:32                             ` Sage Weil
2006-12-04 18:02                           ` Peter Staubach
2006-12-05 23:20                             ` readdirplus() as possible POSIX I/O API Sage Weil
2006-12-06 15:48                               ` Peter Staubach
2006-12-03  1:57                         ` NFSv4/pNFS possible POSIX I/O API standards Andreas Dilger
2006-12-03  7:34                           ` Kari Hurtta
2006-12-03  1:52                     ` Andreas Dilger
2006-12-03 16:10                       ` Sage Weil
2006-12-04  7:32                         ` Andreas Dilger
2006-12-04 15:15                           ` Trond Myklebust
2006-12-05  0:59                             ` Rob Ross
2006-12-05  4:44                               ` Gary Grider
2006-12-05 10:05                                 ` Christoph Hellwig
2006-12-05  5:56                               ` Trond Myklebust
2006-12-05 10:07                                 ` Christoph Hellwig
2006-12-05 14:20                                   ` Matthew Wilcox
2006-12-06 15:04                                     ` Rob Ross
2006-12-06 15:44                                       ` Matthew Wilcox
2006-12-06 16:15                                         ` Rob Ross
2006-12-05 14:55                                   ` Trond Myklebust
2006-12-05 22:11                                     ` Rob Ross
2006-12-05 23:24                                       ` Trond Myklebust
2006-12-06 16:42                                         ` Rob Ross
2006-12-06 12:22                                     ` Ragnar Kjørstad
2006-12-06 15:14                                       ` Trond Myklebust
2006-12-05 16:55                                   ` Latchesar Ionkov
2006-12-05 22:12                                     ` Christoph Hellwig
2006-12-06 23:12                                       ` Latchesar Ionkov
2006-12-06 23:33                                         ` Trond Myklebust
2006-12-05 21:50                                   ` Rob Ross
2006-12-05 22:05                                     ` Christoph Hellwig
2006-12-05 23:18                                       ` Sage Weil
2006-12-05 23:55                                       ` Ulrich Drepper
2006-12-06 10:06                                         ` Andreas Dilger
2006-12-06 17:19                                           ` Ulrich Drepper
2006-12-06 17:27                                             ` Rob Ross
2006-12-06 17:42                                               ` Ulrich Drepper
2006-12-06 18:01                                                 ` Ragnar Kjørstad
2006-12-06 18:13                                                   ` Ulrich Drepper
2006-12-17 14:41                                                     ` Ragnar Kjørstad
2006-12-17 19:07                                                       ` Ulrich Drepper
2006-12-17 19:38                                                         ` Matthew Wilcox
2006-12-17 21:51                                                           ` Ulrich Drepper
2006-12-18  2:57                                                             ` Ragnar Kjørstad
2006-12-18  3:54                                                               ` Gary Grider
2006-12-07  5:57                                                 ` Andreas Dilger
2006-12-15 22:37                                                   ` Ulrich Drepper
2006-12-16 18:13                                                     ` Andreas Dilger
2006-12-16 19:08                                                       ` Ulrich Drepper
2006-12-14 23:58                                         ` statlite() Rob Ross
2006-12-07 23:39                                       ` NFSv4/pNFS possible POSIX I/O API standards Nikita Danilov
2006-12-05 14:37                               ` Peter Staubach
2006-12-05 10:26                             ` Andreas Dilger [this message]
2006-12-05 15:23                               ` readdirplus() as possible POSIX I/O API Trond Myklebust
2006-12-06 10:28                                 ` Andreas Dilger
2006-12-06 15:10                                   ` Trond Myklebust
2006-12-05 17:06                               ` Latchesar Ionkov
2006-12-05 22:48                                 ` Rob Ross
2006-11-29 10:25       ` NFSv4/pNFS possible POSIX I/O API standards Steven Whitehouse
2006-11-30 12:29         ` Christoph Hellwig
2006-12-01 15:52       ` Ric Wheeler
2006-11-29 12:23     ` Matthew Wilcox
2006-11-29 12:35       ` Matthew Wilcox
2006-11-29 16:26         ` Gary Grider
2006-11-29 17:18           ` Christoph Hellwig
2006-11-29 12:39       ` Christoph Hellwig
2006-12-01 22:29         ` Rob Ross
2006-12-02  2:35           ` Latchesar Ionkov
2006-12-05  0:37             ` Rob Ross
2006-12-05 10:02               ` Christoph Hellwig
2006-12-05 16:47               ` Latchesar Ionkov
2006-12-05 17:01                 ` Matthew Wilcox
     [not found]                   ` <f158dc670612050909m366594c5ubaa87d9a9ecc8c2a@mail.gmail.com>
2006-12-05 17:10                     ` Latchesar Ionkov
2006-12-05 17:39                     ` Matthew Wilcox
2006-12-05 21:55                       ` Rob Ross
2006-12-05 21:50                   ` Peter Staubach
2006-12-05 21:44                 ` Rob Ross
2006-12-06 11:01                   ` openg Christoph Hellwig
2006-12-06 15:41                     ` openg Trond Myklebust
2006-12-06 15:42                     ` openg Rob Ross
2006-12-06 23:32                       ` openg Christoph Hellwig
2006-12-14 23:36                         ` openg Rob Ross
2006-12-06 23:25                   ` Re: NFSv4/pNFS possible POSIX I/O API standards Latchesar Ionkov
2006-12-06  9:48                 ` David Chinner
2006-12-06 15:53                   ` openg and path_to_handle Rob Ross
2006-12-06 16:04                     ` Matthew Wilcox
2006-12-06 16:20                       ` Rob Ross
2006-12-06 20:57                         ` David Chinner
2006-12-06 20:40                     ` David Chinner
2006-12-06 20:50                       ` Matthew Wilcox
2006-12-06 21:09                         ` David Chinner
2006-12-06 22:09                         ` Andreas Dilger
2006-12-06 22:17                           ` Matthew Wilcox
2006-12-06 22:41                             ` Andreas Dilger
2006-12-06 23:39                           ` Christoph Hellwig
2006-12-14 22:52                             ` Rob Ross
2006-12-06 20:50                       ` Rob Ross
2006-12-06 21:01                         ` David Chinner
2006-12-06 23:19                     ` Latchesar Ionkov
2006-12-14 21:00                       ` Rob Ross
2006-12-14 21:20                         ` Matthew Wilcox
2006-12-14 23:02                           ` Rob Ross
2006-11-28 15:08 ` NFSv4/pNFS possible POSIX I/O API standards Matthew Wilcox

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20061205102644.GH5937@schatzie.adilger.int \
    --to=adilger@clusterfs.com \
    --cc=aia21@cam.ac.uk \
    --cc=flar@allandria.com \
    --cc=ggrider@lanl.gov \
    --cc=hch@infradead.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=sage@newdream.net \
    --cc=trond.myklebust@fys.uio.no \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).