From: Theodore Tso <tytso@mit.edu>
To: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: "Jörn Engel" <joern@lazybastard.org>,
"H. Peter Anvin" <hpa@zytor.com>,
"Christoph Hellwig" <hch@infradead.org>,
"Ulrich Drepper" <drepper@gmail.com>,
"Linux Kernel Mailing List" <linux-kernel@vger.kernel.org>,
"Neil Brown" <neilb@suse.de>
Subject: Re: If not readdir() then what?
Date: Tue, 10 Apr 2007 09:56:41 -0400 [thread overview]
Message-ID: <20070410135641.GG13650@thunk.org> (raw)
In-Reply-To: <1176127395.6210.34.camel@heimdal.trondhjem.org>
On Mon, Apr 09, 2007 at 10:03:15AM -0400, Trond Myklebust wrote:
> We could perhaps teach nfsd to open the file without the O_LARGEFILE
> attribute in the case of NFSv2?
That might work. But if in the long term we want to separate out what
we can send back via telldir/seekdir, and some future new Posix
interface, I wonder if we might be better off defining a formal
interface which can be used by NFSv2 and NFSv3/v4 that isn't
necessarily tied to f_pos. Given that the semantics for what
telldir/seekdir are different from what what NFS needs
(telldir/seekdir cookies don't have to be persistent), it may be
useful to allow filesystems the option of having two separate options
for how to export this information.
> Not really.
>
> However on NFSv3 and v4 there is actually a mechanism for declaring that
> the existing set of cookies have expired and are no longer valid: you
> have an 8-byte opaque 'verifier' which is supplied by the server, and
> which is supposed to be returned by the client on every call to READDIR.
> If the server wants to change its cookie scheme, then it signals it to
> the client by changing its verifier, and returning an error whenever the
> client tries to use the old verifier. Upon receiving that error, the
> client is supposed to clear out all cached cookies, and read the
> directory in again from the start.
I looked at that, and it's not really helpful. Basically if NFS
demands that cookies never collide, and states that cookies must be
some small (32 or 64) bit value that are persistent across time and
server reboots, then that's fundmaentally incompatible with any kind
of non-linear directory structure. So whether the filesystem is
ext3/htree, or ntfs, or reiserfs, people will be cheating one way or
another.
One of the things which they could do I suppose is use a linear
offset, and then change the verifier every single time there is a
b-tree split or merge which changes the configuration of the tree. As
you say, though, forcing the client to re-read the entire contents of
the directory each time we change the verifier doesn't scale too well.
But the fact of the matter is that if NFS protocols demands that a
per-directory entry cookie can be uniquely and permanently (including
across server reboots) identified with a small integer number, it's
dreaming. Filesystem authors will cheat one way or another, because
there's nothing else for them to do.
> Note also that we would have to fix the client implementation. Nobody
> has bothered working on the code to handle verifier changes since there
> are no servers out there in the wild that use it.
... which means changing the verifier every node merge/split operation
would probably cause all sorts of interesting breakages, even more
than the occasional hash collision (which as far as I know no one has
complained about so far --- but with the 32-bit cookie, the birthday
paradox states that the probability of a collision is 1 in 65536, so
it's probably happened out in the wild already).
Regards,
- Ted
next prev parent reply other threads:[~2007-04-10 13:57 UTC|newest]
Thread overview: 65+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-04-07 16:57 If not readdir() then what? Ulrich Drepper
2007-04-07 20:36 ` Theodore Tso
2007-04-07 23:30 ` Christoph Hellwig
2007-04-08 18:11 ` H. Peter Anvin
2007-04-08 18:41 ` Jörn Engel
2007-04-08 19:19 ` Theodore Tso
2007-04-08 19:26 ` Ulrich Drepper
2007-04-08 19:28 ` H. Peter Anvin
2007-04-08 19:40 ` Ulrich Drepper
2007-04-09 1:44 ` Theodore Tso
2007-04-09 11:09 ` Jörn Engel
2007-04-09 12:29 ` Trond Myklebust
2007-04-09 12:31 ` Trond Myklebust
2007-04-09 13:19 ` Theodore Tso
2007-04-09 14:03 ` Trond Myklebust
2007-04-09 16:34 ` Jan Engelhardt
2007-04-09 17:00 ` Trond Myklebust
2007-04-10 13:56 ` Theodore Tso [this message]
2007-04-10 14:10 ` Ulrich Drepper
2007-04-10 15:48 ` H. Peter Anvin
2007-04-10 16:42 ` Ulrich Drepper
2007-04-10 14:37 ` Trond Myklebust
2007-04-10 15:54 ` Jan Engelhardt
2007-04-10 16:18 ` H. Peter Anvin
2007-04-10 16:25 ` Valdis.Kletnieks
2007-04-10 21:12 ` Neil Brown
2007-04-10 21:16 ` H. Peter Anvin
2007-04-10 21:43 ` Neil Brown
2007-04-10 21:18 ` Trond Myklebust
2007-04-10 21:37 ` Neil Brown
2007-04-10 21:57 ` Bob Copeland
2007-04-10 21:59 ` Trond Myklebust
2007-04-10 22:33 ` Neil Brown
2007-04-11 0:22 ` Trond Myklebust
2007-04-11 1:45 ` Bernd Eckenfels
2007-04-10 21:46 ` Alan Cox
2007-04-10 21:26 ` Neil Brown
2007-04-09 12:46 ` Andreas Schwab
2007-04-10 21:15 ` Neil Brown
2007-04-11 13:57 ` Jan Engelhardt
2007-04-11 14:42 ` Theodore Tso
2007-04-11 22:32 ` Neil Brown
2007-04-11 22:06 ` David Lang
2007-04-11 23:23 ` H. Peter Anvin
2007-04-11 23:33 ` Jörn Engel
2007-04-12 0:00 ` Neil Brown
2007-04-11 23:22 ` Theodore Tso
2007-04-12 1:46 ` Neil Brown
2007-04-12 2:37 ` Jörn Engel
2007-04-12 5:57 ` Neil Brown
2007-04-12 9:33 ` Jörn Engel
2007-04-12 12:21 ` Theodore Tso
2007-04-12 17:18 ` J. Bruce Fields
2007-04-12 17:35 ` H. Peter Anvin
2007-04-16 3:05 ` Theodore Tso
2007-04-16 5:47 ` Neil Brown
2007-04-16 10:39 ` Theodore Tso
2007-04-16 6:18 ` Neil Brown
2007-04-16 11:07 ` Theodore Tso
2007-04-16 23:24 ` Neil Brown
2007-04-08 18:47 ` Theodore Tso
2007-04-08 19:13 ` H. Peter Anvin
2007-04-08 18:50 ` Ulrich Drepper
2007-04-07 23:44 ` Jan Engelhardt
2007-04-08 20:36 ` J. Bruce Fields
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20070410135641.GG13650@thunk.org \
--to=tytso@mit.edu \
--cc=drepper@gmail.com \
--cc=hch@infradead.org \
--cc=hpa@zytor.com \
--cc=joern@lazybastard.org \
--cc=linux-kernel@vger.kernel.org \
--cc=neilb@suse.de \
--cc=trond.myklebust@fys.uio.no \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox