From: Sage Weil <sage@newdream.net>
To: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: Christoph Hellwig <hch@infradead.org>,
Brad Boyer <flar@allandria.com>,
Anton Altaparmakov <aia21@cam.ac.uk>,
Andreas Dilger <adilger@clusterfs.com>,
Gary Grider <ggrider@lanl.gov>,
linux-fsdevel@vger.kernel.org
Subject: Re: NFSv4/pNFS possible POSIX I/O API standards
Date: Thu, 30 Nov 2006 23:08:41 -0800 (PST) [thread overview]
Message-ID: <Pine.LNX.4.62.0611302157580.10257@wtf.di.newdream.net> (raw)
In-Reply-To: <1164950795.5761.25.camel@lade.trondhjem.org>
On Fri, 1 Dec 2006, Trond Myklebust wrote:
> What exactly do you mean by an "atomic readdirplus"? Standard readdir is
> by its very nature weakly cached, and there is no guarantee whatsoever
> even that you will see all files in the directory. See the SuSv3
> definition, which explicitly states that there is no ordering w.r.t.
> file creation/deletion:
>
> The type DIR, which is defined in the <dirent.h> header,
> represents a directory stream, which is an ordered sequence of
> all the directory entries in a particular directory. Directory
> entries represent files; files may be removed from a directory
> or added to a directory asynchronously to the operation of
> readdir().
I mean atomic only in the sense that the stat result returned by
readdirplus() would reflect the file state at some point during the time
consumed by that system call. In contrast, when you call stat()
separately, it's expected that the result you get back reflects the state
at some time during the stat() call, and not the readdir() that may
have preceeded it. readdir() results may be weakly cached, but stat()
results normally aren't (ignoring the usual NFS behavior for the moment).
It's the stat() part of readdir() + stat() that makes life unnecessarily
difficult for a filesystem providing strong consistency. How can the
filesystem know that 'ls' doesn't care if the stat() results are accurate
at the time of the readdir() and not the subsequent stat()? Something
like readdirplus() allows that to be explicitly communicated, without
resorting to heuristics or weak metadata consistency (ala NFS attribute
caching). For distributed or network filesystems that can be a big win.
(Admittedly, there's probably little benefit for local filesystems beyond
the possibility of better prefetching, if syscalls are as cheap as
Christoph says.)
> Besides, why would your application care about atomicity of the
> attribute information unless you also have some form of locking to
> guarantee that said information remains valid until you are done
> processing it?
Something like 'ls' certainly doesn't care, but in general applications do
care that stat() results aren't cached. They expect the stat results to
reflect the file's state at a point in time _after_ they decide to call
stat(). For example, for process A to see how much data a just-finished
process B wrote to a file...
sage
next prev parent reply other threads:[~2006-12-01 7:08 UTC|newest]
Thread overview: 124+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-11-28 4:34 NFSv4/pNFS possible POSIX I/O API standards Gary Grider
2006-11-28 5:54 ` Christoph Hellwig
2006-11-28 10:54 ` Andreas Dilger
2006-11-28 11:28 ` Anton Altaparmakov
2006-11-28 20:17 ` Russell Cattelan
2006-11-28 23:28 ` Wendy Cheng
2006-11-29 9:12 ` Christoph Hellwig
2006-11-29 9:04 ` Christoph Hellwig
2006-11-29 9:14 ` Christoph Hellwig
2006-11-29 9:48 ` Andreas Dilger
2006-11-29 10:18 ` Anton Altaparmakov
2006-11-29 8:26 ` Brad Boyer
2006-11-30 9:25 ` Christoph Hellwig
2006-11-30 17:49 ` Sage Weil
2006-12-01 5:26 ` Trond Myklebust
2006-12-01 7:08 ` Sage Weil [this message]
2006-12-01 14:41 ` Trond Myklebust
2006-12-01 16:47 ` Sage Weil
2006-12-01 18:07 ` Trond Myklebust
2006-12-01 18:42 ` Sage Weil
2006-12-01 19:13 ` Trond Myklebust
2006-12-01 20:32 ` Sage Weil
2006-12-04 18:02 ` Peter Staubach
2006-12-05 23:20 ` readdirplus() as possible POSIX I/O API Sage Weil
2006-12-06 15:48 ` Peter Staubach
2006-12-03 1:57 ` NFSv4/pNFS possible POSIX I/O API standards Andreas Dilger
2006-12-03 7:34 ` Kari Hurtta
2006-12-03 1:52 ` Andreas Dilger
2006-12-03 16:10 ` Sage Weil
2006-12-04 7:32 ` Andreas Dilger
2006-12-04 15:15 ` Trond Myklebust
2006-12-05 0:59 ` Rob Ross
2006-12-05 4:44 ` Gary Grider
2006-12-05 10:05 ` Christoph Hellwig
2006-12-05 5:56 ` Trond Myklebust
2006-12-05 10:07 ` Christoph Hellwig
2006-12-05 14:20 ` Matthew Wilcox
2006-12-06 15:04 ` Rob Ross
2006-12-06 15:44 ` Matthew Wilcox
2006-12-06 16:15 ` Rob Ross
2006-12-05 14:55 ` Trond Myklebust
2006-12-05 22:11 ` Rob Ross
2006-12-05 23:24 ` Trond Myklebust
2006-12-06 16:42 ` Rob Ross
2006-12-06 12:22 ` Ragnar Kjørstad
2006-12-06 15:14 ` Trond Myklebust
2006-12-05 16:55 ` Latchesar Ionkov
2006-12-05 22:12 ` Christoph Hellwig
2006-12-06 23:12 ` Latchesar Ionkov
2006-12-06 23:33 ` Trond Myklebust
2006-12-05 21:50 ` Rob Ross
2006-12-05 22:05 ` Christoph Hellwig
2006-12-05 23:18 ` Sage Weil
2006-12-05 23:55 ` Ulrich Drepper
2006-12-06 10:06 ` Andreas Dilger
2006-12-06 17:19 ` Ulrich Drepper
2006-12-06 17:27 ` Rob Ross
2006-12-06 17:42 ` Ulrich Drepper
2006-12-06 18:01 ` Ragnar Kjørstad
2006-12-06 18:13 ` Ulrich Drepper
2006-12-17 14:41 ` Ragnar Kjørstad
2006-12-17 19:07 ` Ulrich Drepper
2006-12-17 19:38 ` Matthew Wilcox
2006-12-17 21:51 ` Ulrich Drepper
2006-12-18 2:57 ` Ragnar Kjørstad
2006-12-18 3:54 ` Gary Grider
2006-12-07 5:57 ` Andreas Dilger
2006-12-15 22:37 ` Ulrich Drepper
2006-12-16 18:13 ` Andreas Dilger
2006-12-16 19:08 ` Ulrich Drepper
2006-12-14 23:58 ` statlite() Rob Ross
2006-12-07 23:39 ` NFSv4/pNFS possible POSIX I/O API standards Nikita Danilov
2006-12-05 14:37 ` Peter Staubach
2006-12-05 10:26 ` readdirplus() as possible POSIX I/O API Andreas Dilger
2006-12-05 15:23 ` Trond Myklebust
2006-12-06 10:28 ` Andreas Dilger
2006-12-06 15:10 ` Trond Myklebust
2006-12-05 17:06 ` Latchesar Ionkov
2006-12-05 22:48 ` Rob Ross
2006-11-29 10:25 ` NFSv4/pNFS possible POSIX I/O API standards Steven Whitehouse
2006-11-30 12:29 ` Christoph Hellwig
2006-12-01 15:52 ` Ric Wheeler
2006-11-29 12:23 ` Matthew Wilcox
2006-11-29 12:35 ` Matthew Wilcox
2006-11-29 16:26 ` Gary Grider
2006-11-29 17:18 ` Christoph Hellwig
2006-11-29 12:39 ` Christoph Hellwig
2006-12-01 22:29 ` Rob Ross
2006-12-02 2:35 ` Latchesar Ionkov
2006-12-05 0:37 ` Rob Ross
2006-12-05 10:02 ` Christoph Hellwig
2006-12-05 16:47 ` Latchesar Ionkov
2006-12-05 17:01 ` Matthew Wilcox
[not found] ` <f158dc670612050909m366594c5ubaa87d9a9ecc8c2a@mail.gmail.com>
2006-12-05 17:10 ` Latchesar Ionkov
2006-12-05 17:39 ` Matthew Wilcox
2006-12-05 21:55 ` Rob Ross
2006-12-05 21:50 ` Peter Staubach
2006-12-05 21:44 ` Rob Ross
2006-12-06 11:01 ` openg Christoph Hellwig
2006-12-06 15:41 ` openg Trond Myklebust
2006-12-06 15:42 ` openg Rob Ross
2006-12-06 23:32 ` openg Christoph Hellwig
2006-12-14 23:36 ` openg Rob Ross
2006-12-06 23:25 ` Re: NFSv4/pNFS possible POSIX I/O API standards Latchesar Ionkov
2006-12-06 9:48 ` David Chinner
2006-12-06 15:53 ` openg and path_to_handle Rob Ross
2006-12-06 16:04 ` Matthew Wilcox
2006-12-06 16:20 ` Rob Ross
2006-12-06 20:57 ` David Chinner
2006-12-06 20:40 ` David Chinner
2006-12-06 20:50 ` Matthew Wilcox
2006-12-06 21:09 ` David Chinner
2006-12-06 22:09 ` Andreas Dilger
2006-12-06 22:17 ` Matthew Wilcox
2006-12-06 22:41 ` Andreas Dilger
2006-12-06 23:39 ` Christoph Hellwig
2006-12-14 22:52 ` Rob Ross
2006-12-06 20:50 ` Rob Ross
2006-12-06 21:01 ` David Chinner
2006-12-06 23:19 ` Latchesar Ionkov
2006-12-14 21:00 ` Rob Ross
2006-12-14 21:20 ` Matthew Wilcox
2006-12-14 23:02 ` Rob Ross
2006-11-28 15:08 ` NFSv4/pNFS possible POSIX I/O API standards Matthew Wilcox
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Pine.LNX.4.62.0611302157580.10257@wtf.di.newdream.net \
--to=sage@newdream.net \
--cc=adilger@clusterfs.com \
--cc=aia21@cam.ac.uk \
--cc=flar@allandria.com \
--cc=ggrider@lanl.gov \
--cc=hch@infradead.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=trond.myklebust@fys.uio.no \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).