From: Rob Ross <rross@mcs.anl.gov>
To: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: Andreas Dilger <adilger@clusterfs.com>,
Sage Weil <sage@newdream.net>,
Christoph Hellwig <hch@infradead.org>,
Brad Boyer <flar@allandria.com>,
Anton Altaparmakov <aia21@cam.ac.uk>,
Gary Grider <ggrider@lanl.gov>,
linux-fsdevel@vger.kernel.org
Subject: Re: NFSv4/pNFS possible POSIX I/O API standards
Date: Mon, 04 Dec 2006 18:59:54 -0600 [thread overview]
Message-ID: <4574C48A.8030007@mcs.anl.gov> (raw)
In-Reply-To: <1165245336.711.176.camel@lade.trondhjem.org>
Hi all,
I don't think that the group intended that there be an opendirplus();
rather readdirplus() would simply be called instead of the usual
readdir(). We should clarify that.
Regarding Peter Staubach's comments about no one ever using the
readdirplus() call; well, if people weren't performing this workload in
the first place, we wouldn't *need* this sort of call! This call is
specifically targeted at improving "ls -l" performance on large
directories, and Sage has pointed out quite nicely how that might work.
In our case (PVFS), we would essentially perform three phases of
communication with the file system for a readdirplus that was obtaining
full statistics: first grabbing the directory entries, then obtaining
metadata from servers on all objects in bulk, then gathering file sizes
in bulk. The reduction in control message traffic is enormous, and the
concurrency is much greater than in a readdir()+stat()s workload. We'd
never perform this sort of optimization optimistically, as the cost of
guessing wrong is just too high. We would want to see the call as a
proper VFS operation that we could act upon.
The entire readdirplus() operation wasn't intended to be atomic, and in
fact the returned structure has space for an error associated with the
stat() on a particular entry, to allow for implementations that stat()
subsequently and get an error because the object was removed between
when the entry was read out of the directory and when the stat was
performed. I think this fits well with what Andreas and others are
thinking. We should clarify the description appropriately.
I don't think that we have a readdirpluslite() variation documented yet?
Gary? It would make a lot of sense. Except that it should probably have
a better name...
Regarding Andreas's note that he would prefer the statlite() flags to
mean "valid", that makes good sense to me (and would obviously apply to
the so-far even more hypothetical readdirpluslite()). I don't think
there's a lot of value in returning possibly-inaccurate values?
Thanks everyone,
Rob
Trond Myklebust wrote:
> On Mon, 2006-12-04 at 00:32 -0700, Andreas Dilger wrote:
>>> I'm wondering if a corresponding opendirplus() (or similar) would also be
>>> appropriate to inform the kernel/filesystem that readdirplus() will
>>> follow, and stat information should be gathered/buffered. Or do most
>>> implementations wait for the first readdir() before doing any actual work
>>> anyway?
>> I'm not sure what some filesystems might do here. I suppose NFS has weak
>> enough cache semantics that it _might_ return stale cached data from the
>> client in order to fill the readdirplus() data, but it is just as likely
>> that it ships the whole thing to the server and returns everything in
>> one shot. That would imply everything would be at least as up-to-date
>> as the opendir().
>
> Whether or not the posix committee decides on readdirplus, I propose
> that we implement this sort of thing in the kernel via a readdir
> equivalent to posix_fadvise(). That can give exactly the barrier
> semantics that they are asking for, and only costs 1 extra syscall as
> opposed to 2 (opendirplus() and readdirplus()).
>
> Cheers
> Trond
next prev parent reply other threads:[~2006-12-05 0:59 UTC|newest]
Thread overview: 124+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-11-28 4:34 NFSv4/pNFS possible POSIX I/O API standards Gary Grider
2006-11-28 5:54 ` Christoph Hellwig
2006-11-28 10:54 ` Andreas Dilger
2006-11-28 11:28 ` Anton Altaparmakov
2006-11-28 20:17 ` Russell Cattelan
2006-11-28 23:28 ` Wendy Cheng
2006-11-29 9:12 ` Christoph Hellwig
2006-11-29 9:04 ` Christoph Hellwig
2006-11-29 9:14 ` Christoph Hellwig
2006-11-29 9:48 ` Andreas Dilger
2006-11-29 10:18 ` Anton Altaparmakov
2006-11-29 8:26 ` Brad Boyer
2006-11-30 9:25 ` Christoph Hellwig
2006-11-30 17:49 ` Sage Weil
2006-12-01 5:26 ` Trond Myklebust
2006-12-01 7:08 ` Sage Weil
2006-12-01 14:41 ` Trond Myklebust
2006-12-01 16:47 ` Sage Weil
2006-12-01 18:07 ` Trond Myklebust
2006-12-01 18:42 ` Sage Weil
2006-12-01 19:13 ` Trond Myklebust
2006-12-01 20:32 ` Sage Weil
2006-12-04 18:02 ` Peter Staubach
2006-12-05 23:20 ` readdirplus() as possible POSIX I/O API Sage Weil
2006-12-06 15:48 ` Peter Staubach
2006-12-03 1:57 ` NFSv4/pNFS possible POSIX I/O API standards Andreas Dilger
2006-12-03 7:34 ` Kari Hurtta
2006-12-03 1:52 ` Andreas Dilger
2006-12-03 16:10 ` Sage Weil
2006-12-04 7:32 ` Andreas Dilger
2006-12-04 15:15 ` Trond Myklebust
2006-12-05 0:59 ` Rob Ross [this message]
2006-12-05 4:44 ` Gary Grider
2006-12-05 10:05 ` Christoph Hellwig
2006-12-05 5:56 ` Trond Myklebust
2006-12-05 10:07 ` Christoph Hellwig
2006-12-05 14:20 ` Matthew Wilcox
2006-12-06 15:04 ` Rob Ross
2006-12-06 15:44 ` Matthew Wilcox
2006-12-06 16:15 ` Rob Ross
2006-12-05 14:55 ` Trond Myklebust
2006-12-05 22:11 ` Rob Ross
2006-12-05 23:24 ` Trond Myklebust
2006-12-06 16:42 ` Rob Ross
2006-12-06 12:22 ` Ragnar Kjørstad
2006-12-06 15:14 ` Trond Myklebust
2006-12-05 16:55 ` Latchesar Ionkov
2006-12-05 22:12 ` Christoph Hellwig
2006-12-06 23:12 ` Latchesar Ionkov
2006-12-06 23:33 ` Trond Myklebust
2006-12-05 21:50 ` Rob Ross
2006-12-05 22:05 ` Christoph Hellwig
2006-12-05 23:18 ` Sage Weil
2006-12-05 23:55 ` Ulrich Drepper
2006-12-06 10:06 ` Andreas Dilger
2006-12-06 17:19 ` Ulrich Drepper
2006-12-06 17:27 ` Rob Ross
2006-12-06 17:42 ` Ulrich Drepper
2006-12-06 18:01 ` Ragnar Kjørstad
2006-12-06 18:13 ` Ulrich Drepper
2006-12-17 14:41 ` Ragnar Kjørstad
2006-12-17 19:07 ` Ulrich Drepper
2006-12-17 19:38 ` Matthew Wilcox
2006-12-17 21:51 ` Ulrich Drepper
2006-12-18 2:57 ` Ragnar Kjørstad
2006-12-18 3:54 ` Gary Grider
2006-12-07 5:57 ` Andreas Dilger
2006-12-15 22:37 ` Ulrich Drepper
2006-12-16 18:13 ` Andreas Dilger
2006-12-16 19:08 ` Ulrich Drepper
2006-12-14 23:58 ` statlite() Rob Ross
2006-12-07 23:39 ` NFSv4/pNFS possible POSIX I/O API standards Nikita Danilov
2006-12-05 14:37 ` Peter Staubach
2006-12-05 10:26 ` readdirplus() as possible POSIX I/O API Andreas Dilger
2006-12-05 15:23 ` Trond Myklebust
2006-12-06 10:28 ` Andreas Dilger
2006-12-06 15:10 ` Trond Myklebust
2006-12-05 17:06 ` Latchesar Ionkov
2006-12-05 22:48 ` Rob Ross
2006-11-29 10:25 ` NFSv4/pNFS possible POSIX I/O API standards Steven Whitehouse
2006-11-30 12:29 ` Christoph Hellwig
2006-12-01 15:52 ` Ric Wheeler
2006-11-29 12:23 ` Matthew Wilcox
2006-11-29 12:35 ` Matthew Wilcox
2006-11-29 16:26 ` Gary Grider
2006-11-29 17:18 ` Christoph Hellwig
2006-11-29 12:39 ` Christoph Hellwig
2006-12-01 22:29 ` Rob Ross
2006-12-02 2:35 ` Latchesar Ionkov
2006-12-05 0:37 ` Rob Ross
2006-12-05 10:02 ` Christoph Hellwig
2006-12-05 16:47 ` Latchesar Ionkov
2006-12-05 17:01 ` Matthew Wilcox
[not found] ` <f158dc670612050909m366594c5ubaa87d9a9ecc8c2a@mail.gmail.com>
2006-12-05 17:10 ` Latchesar Ionkov
2006-12-05 17:39 ` Matthew Wilcox
2006-12-05 21:55 ` Rob Ross
2006-12-05 21:50 ` Peter Staubach
2006-12-05 21:44 ` Rob Ross
2006-12-06 11:01 ` openg Christoph Hellwig
2006-12-06 15:41 ` openg Trond Myklebust
2006-12-06 15:42 ` openg Rob Ross
2006-12-06 23:32 ` openg Christoph Hellwig
2006-12-14 23:36 ` openg Rob Ross
2006-12-06 23:25 ` Re: NFSv4/pNFS possible POSIX I/O API standards Latchesar Ionkov
2006-12-06 9:48 ` David Chinner
2006-12-06 15:53 ` openg and path_to_handle Rob Ross
2006-12-06 16:04 ` Matthew Wilcox
2006-12-06 16:20 ` Rob Ross
2006-12-06 20:57 ` David Chinner
2006-12-06 20:40 ` David Chinner
2006-12-06 20:50 ` Matthew Wilcox
2006-12-06 21:09 ` David Chinner
2006-12-06 22:09 ` Andreas Dilger
2006-12-06 22:17 ` Matthew Wilcox
2006-12-06 22:41 ` Andreas Dilger
2006-12-06 23:39 ` Christoph Hellwig
2006-12-14 22:52 ` Rob Ross
2006-12-06 20:50 ` Rob Ross
2006-12-06 21:01 ` David Chinner
2006-12-06 23:19 ` Latchesar Ionkov
2006-12-14 21:00 ` Rob Ross
2006-12-14 21:20 ` Matthew Wilcox
2006-12-14 23:02 ` Rob Ross
2006-11-28 15:08 ` NFSv4/pNFS possible POSIX I/O API standards Matthew Wilcox
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4574C48A.8030007@mcs.anl.gov \
--to=rross@mcs.anl.gov \
--cc=adilger@clusterfs.com \
--cc=aia21@cam.ac.uk \
--cc=flar@allandria.com \
--cc=ggrider@lanl.gov \
--cc=hch@infradead.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=sage@newdream.net \
--cc=trond.myklebust@fys.uio.no \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).