linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Sage Weil <sage@newdream.net>
To: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: Christoph Hellwig <hch@infradead.org>,
	Brad Boyer <flar@allandria.com>,
	Anton Altaparmakov <aia21@cam.ac.uk>,
	Andreas Dilger <adilger@clusterfs.com>,
	Gary Grider <ggrider@lanl.gov>,
	linux-fsdevel@vger.kernel.org
Subject: Re: NFSv4/pNFS possible POSIX I/O API standards
Date: Fri, 1 Dec 2006 08:47:31 -0800 (PST)	[thread overview]
Message-ID: <Pine.LNX.4.62.0612010846400.10257@wtf.di.newdream.net> (raw)
In-Reply-To: <1164984094.5761.86.camel@lade.trondhjem.org>

On Fri, 1 Dec 2006, Trond Myklebust wrote:
> 'ls --color' and 'find' don't give a toss about most of the arguments 
> from 'stat()'. They just want to know what kind of filesystem object 
> they are dealing with. We already provide that information in the 
> readdir() syscall via the 'd_type' field. Adding all the other stat() 
> information is just going to add unnecessary synchronisation burdens.

'ls -al' cares about the stat() results, but does not care about the 
relative timing accuracy wrt the preceeding readdir().  I'm not sure why 
'ls --color' still calls stat when it can get that from the readdir() 
results, but either way it's asking more from the kernel/filesystem than 
it needs.

>> Something like 'ls' certainly doesn't care, but in general applications do
>> care that stat() results aren't cached.  They expect the stat results to
>> reflect the file's state at a point in time _after_ they decide to call
>> stat().  For example, for process A to see how much data a just-finished
>> process B wrote to a file...
>
> AFAICS, it will not change any consistency semantics. The main
> irritation it will introduce will be that the NFS client will suddenly
> have to do things like synchronising readdirplus() and file write() in
> order to provide the POSIX guarantees that you mentioned.
>
> i.e: if someone has written data to one of the files in the directory,
> then an NFS client will now have to flush that data out before calling
> readdir so that the server returns the correct m/ctime or file size.
> Previously, it could delay that until the stat() call.

It sounds like you're talking about a single (asynchronous) client in a 
directory.  In that case, the client need only flush if someone calls 
readdirplus() instead of readdir(), and since readdirplus() is effectively 
also a stat(), the situation isn't actually any different.

The more interesting case is multiple clients in the same directory.  In 
order to provide strong consistency, both stat() and readdir() have to 
talk to the server (or more complicated leasing mechanisms are needed). 
In that scenario, readdirplus() is asking for _less_ 
synchronization/consistency of results than readdir()+stat(), not more. 
i.e. both the readdir() and stat() would require a server request in order 
to achieve the standard POSIX semantics, while a readdirplus() would allow 
a single request.  The NFS client already provibes weak consistency of 
stat() results for clients.  Extending the interface doesn't suddenly 
require the NFS client to provide strong consistency, it just makes life 
easier for the implementation if it (or some other filesystem) chooses to 
do so.

Consider two use cases.  Process A is 'ls -al', who doesn't really care 
about when the size/mtime are from (i.e. sometime after opendir()). 
Process B waits for a process on another host to write to a file, and then 
calls stat() locally to check the result.  In order for B to get the 
correct result, stat() _must_ return a value for size/mtime from _after_ 
the stat() initiated.  That makes 'ls -al' slow, because it probably has 
to talk to the server to make sure files haven't been modified between the 
readdir() and stat().  In reality, 'ls -al' doesn't care, but the 
filesystem has no way to know that without the presense of readdirplus(). 
Alternatively, an NFS (or other distributed filesystem) client can cache 
file attributes to make 'ls -al' fast, and simply break process B (as NFS 
currently does).  readdirplus() makes it clear what 'ls -al' doesn't need, 
allowing the client (if it so chooses) to avoid breaking B in the general 
case.  That simply isn't possible to explicitly communicate with the 
existing interface.  How is that not a win?

I imagine that most of the time readdirplus() will hit something in the 
VFS that simply calls readdir() and stat().  But a smart NFS (or other 
network filesytem) client can can opt to send a readdirplus over the wire 
for readdirplus() without sacrificing stat() consistency in the general 
case.

sage

  reply	other threads:[~2006-12-01 16:47 UTC|newest]

Thread overview: 124+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-11-28  4:34 NFSv4/pNFS possible POSIX I/O API standards Gary Grider
2006-11-28  5:54 ` Christoph Hellwig
2006-11-28 10:54   ` Andreas Dilger
2006-11-28 11:28     ` Anton Altaparmakov
2006-11-28 20:17     ` Russell Cattelan
2006-11-28 23:28     ` Wendy Cheng
2006-11-29  9:12       ` Christoph Hellwig
2006-11-29  9:04   ` Christoph Hellwig
2006-11-29  9:14     ` Christoph Hellwig
2006-11-29  9:48     ` Andreas Dilger
2006-11-29 10:18       ` Anton Altaparmakov
2006-11-29  8:26         ` Brad Boyer
2006-11-30  9:25           ` Christoph Hellwig
2006-11-30 17:49             ` Sage Weil
2006-12-01  5:26               ` Trond Myklebust
2006-12-01  7:08                 ` Sage Weil
2006-12-01 14:41                   ` Trond Myklebust
2006-12-01 16:47                     ` Sage Weil [this message]
2006-12-01 18:07                       ` Trond Myklebust
2006-12-01 18:42                         ` Sage Weil
2006-12-01 19:13                           ` Trond Myklebust
2006-12-01 20:32                             ` Sage Weil
2006-12-04 18:02                           ` Peter Staubach
2006-12-05 23:20                             ` readdirplus() as possible POSIX I/O API Sage Weil
2006-12-06 15:48                               ` Peter Staubach
2006-12-03  1:57                         ` NFSv4/pNFS possible POSIX I/O API standards Andreas Dilger
2006-12-03  7:34                           ` Kari Hurtta
2006-12-03  1:52                     ` Andreas Dilger
2006-12-03 16:10                       ` Sage Weil
2006-12-04  7:32                         ` Andreas Dilger
2006-12-04 15:15                           ` Trond Myklebust
2006-12-05  0:59                             ` Rob Ross
2006-12-05  4:44                               ` Gary Grider
2006-12-05 10:05                                 ` Christoph Hellwig
2006-12-05  5:56                               ` Trond Myklebust
2006-12-05 10:07                                 ` Christoph Hellwig
2006-12-05 14:20                                   ` Matthew Wilcox
2006-12-06 15:04                                     ` Rob Ross
2006-12-06 15:44                                       ` Matthew Wilcox
2006-12-06 16:15                                         ` Rob Ross
2006-12-05 14:55                                   ` Trond Myklebust
2006-12-05 22:11                                     ` Rob Ross
2006-12-05 23:24                                       ` Trond Myklebust
2006-12-06 16:42                                         ` Rob Ross
2006-12-06 12:22                                     ` Ragnar Kjørstad
2006-12-06 15:14                                       ` Trond Myklebust
2006-12-05 16:55                                   ` Latchesar Ionkov
2006-12-05 22:12                                     ` Christoph Hellwig
2006-12-06 23:12                                       ` Latchesar Ionkov
2006-12-06 23:33                                         ` Trond Myklebust
2006-12-05 21:50                                   ` Rob Ross
2006-12-05 22:05                                     ` Christoph Hellwig
2006-12-05 23:18                                       ` Sage Weil
2006-12-05 23:55                                       ` Ulrich Drepper
2006-12-06 10:06                                         ` Andreas Dilger
2006-12-06 17:19                                           ` Ulrich Drepper
2006-12-06 17:27                                             ` Rob Ross
2006-12-06 17:42                                               ` Ulrich Drepper
2006-12-06 18:01                                                 ` Ragnar Kjørstad
2006-12-06 18:13                                                   ` Ulrich Drepper
2006-12-17 14:41                                                     ` Ragnar Kjørstad
2006-12-17 19:07                                                       ` Ulrich Drepper
2006-12-17 19:38                                                         ` Matthew Wilcox
2006-12-17 21:51                                                           ` Ulrich Drepper
2006-12-18  2:57                                                             ` Ragnar Kjørstad
2006-12-18  3:54                                                               ` Gary Grider
2006-12-07  5:57                                                 ` Andreas Dilger
2006-12-15 22:37                                                   ` Ulrich Drepper
2006-12-16 18:13                                                     ` Andreas Dilger
2006-12-16 19:08                                                       ` Ulrich Drepper
2006-12-14 23:58                                         ` statlite() Rob Ross
2006-12-07 23:39                                       ` NFSv4/pNFS possible POSIX I/O API standards Nikita Danilov
2006-12-05 14:37                               ` Peter Staubach
2006-12-05 10:26                             ` readdirplus() as possible POSIX I/O API Andreas Dilger
2006-12-05 15:23                               ` Trond Myklebust
2006-12-06 10:28                                 ` Andreas Dilger
2006-12-06 15:10                                   ` Trond Myklebust
2006-12-05 17:06                               ` Latchesar Ionkov
2006-12-05 22:48                                 ` Rob Ross
2006-11-29 10:25       ` NFSv4/pNFS possible POSIX I/O API standards Steven Whitehouse
2006-11-30 12:29         ` Christoph Hellwig
2006-12-01 15:52       ` Ric Wheeler
2006-11-29 12:23     ` Matthew Wilcox
2006-11-29 12:35       ` Matthew Wilcox
2006-11-29 16:26         ` Gary Grider
2006-11-29 17:18           ` Christoph Hellwig
2006-11-29 12:39       ` Christoph Hellwig
2006-12-01 22:29         ` Rob Ross
2006-12-02  2:35           ` Latchesar Ionkov
2006-12-05  0:37             ` Rob Ross
2006-12-05 10:02               ` Christoph Hellwig
2006-12-05 16:47               ` Latchesar Ionkov
2006-12-05 17:01                 ` Matthew Wilcox
     [not found]                   ` <f158dc670612050909m366594c5ubaa87d9a9ecc8c2a@mail.gmail.com>
2006-12-05 17:10                     ` Latchesar Ionkov
2006-12-05 17:39                     ` Matthew Wilcox
2006-12-05 21:55                       ` Rob Ross
2006-12-05 21:50                   ` Peter Staubach
2006-12-05 21:44                 ` Rob Ross
2006-12-06 11:01                   ` openg Christoph Hellwig
2006-12-06 15:41                     ` openg Trond Myklebust
2006-12-06 15:42                     ` openg Rob Ross
2006-12-06 23:32                       ` openg Christoph Hellwig
2006-12-14 23:36                         ` openg Rob Ross
2006-12-06 23:25                   ` Re: NFSv4/pNFS possible POSIX I/O API standards Latchesar Ionkov
2006-12-06  9:48                 ` David Chinner
2006-12-06 15:53                   ` openg and path_to_handle Rob Ross
2006-12-06 16:04                     ` Matthew Wilcox
2006-12-06 16:20                       ` Rob Ross
2006-12-06 20:57                         ` David Chinner
2006-12-06 20:40                     ` David Chinner
2006-12-06 20:50                       ` Matthew Wilcox
2006-12-06 21:09                         ` David Chinner
2006-12-06 22:09                         ` Andreas Dilger
2006-12-06 22:17                           ` Matthew Wilcox
2006-12-06 22:41                             ` Andreas Dilger
2006-12-06 23:39                           ` Christoph Hellwig
2006-12-14 22:52                             ` Rob Ross
2006-12-06 20:50                       ` Rob Ross
2006-12-06 21:01                         ` David Chinner
2006-12-06 23:19                     ` Latchesar Ionkov
2006-12-14 21:00                       ` Rob Ross
2006-12-14 21:20                         ` Matthew Wilcox
2006-12-14 23:02                           ` Rob Ross
2006-11-28 15:08 ` NFSv4/pNFS possible POSIX I/O API standards Matthew Wilcox

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Pine.LNX.4.62.0612010846400.10257@wtf.di.newdream.net \
    --to=sage@newdream.net \
    --cc=adilger@clusterfs.com \
    --cc=aia21@cam.ac.uk \
    --cc=flar@allandria.com \
    --cc=ggrider@lanl.gov \
    --cc=hch@infradead.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=trond.myklebust@fys.uio.no \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).