From: Theodore Tso <tytso@mit.edu>
To: Ulrich Drepper <drepper@gmail.com>
Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: If not readdir() then what?
Date: Sat, 7 Apr 2007 16:36:33 -0400 [thread overview]
Message-ID: <20070407203633.GA21555@thunk.org> (raw)
In-Reply-To: <a36005b50704070957v2d2e6c4g5fac49b5ff1a5789@mail.gmail.com>
On Sat, Apr 07, 2007 at 09:57:32AM -0700, Ulrich Drepper wrote:
> In their closed chambers (well, workshops,
> http://lwn.net/Articles/226351/), the filesystem developers complain
> about readdir. I fully appreciate the difficulties. But what I fail
> to see so far is any proposal for an alternative interface.
>
> The phase to get new functionality included in the next revision of
> POSIX is over. But that does not mean we should not try to get some
> sensible new implementation in place. There is, for example, the
> "High End Computing Extensions Working Group" (the guys who showed up
> here with their statlite and readdirplus proposals). This is an
> official working group at the OpenGroup which can produce a document
> which can be the basis of inclusion in the next revision and become a
> OpenGroup specification earlier than that.
>
> So, if anybody has a proposal for better interfaces let's hear them.
> "Now" is a very good time to start working on this.
The problem isn't as much readdir() as it is telldir()/seekdir(),
which fundamentally assumes that the directory is a linear file. JFS
for example was forced to implement an entire separate btree whose
only purpose was to record telldir cookies.
With ext3 and htree we return the directories in hash tree sort order.
This sacrifices some performance with readdir/stat workloads unless
the userspace program does a readdir/qsort on inode number/stat
replacement. In addition, there is the risk of hash collusions
because of the pathetically small size of the telldir cookie (32
bits). If this happens, the readdir() guarantees of only returning
each directory entry at most once are compromised. We use a keyed
hash with a per-filesystem superblock secret to prevent someone from
deliberately finding a hash collision and proving that we're not POSIX
compliant in the edge cases. Cheasy, yes, but only loser programs
should be using telldir/seekdir anyway. :-)
So how do we solve this problem? I can think of two solutions:
1) Deprecate telldir/seekdir() altogether. Relatively few progams use
this functionality, and it is highly questionable how useful it is,
anyway. If you use telldir/seekdir and keep the cookie for a long
time, even the POSIX-provided guarantees about files that are created
and deleted between the telldir() and seekdir() points in time makes
its utility highly dubious.
2) If application programs must have telldir/seekdir, than expand the
size of the cookie from 32-bits to a minimum of 128 bits, and
preferably larger --- say 512 bits, to accomodate systems that might
be using 512-bit variant of SHA-2. So something like this?
/* TELLDIR_COOKIE_SIZE must be >= 64 */
#define TELLDIR_COOKIE_SIZE 64
typedef unsigned char ltelldir_cookie_t[TELLDIR_COOKIE_SIZE];
int ltelldir(int fd, ltelldircookie_t *cookie);
int lseekdir(int fd, ltelldircookie_t *cookie);
My personal preference would be #1, though. :-)
- Ted
next prev parent reply other threads:[~2007-04-07 20:36 UTC|newest]
Thread overview: 65+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-04-07 16:57 If not readdir() then what? Ulrich Drepper
2007-04-07 20:36 ` Theodore Tso [this message]
2007-04-07 23:30 ` Christoph Hellwig
2007-04-08 18:11 ` H. Peter Anvin
2007-04-08 18:41 ` Jörn Engel
2007-04-08 19:19 ` Theodore Tso
2007-04-08 19:26 ` Ulrich Drepper
2007-04-08 19:28 ` H. Peter Anvin
2007-04-08 19:40 ` Ulrich Drepper
2007-04-09 1:44 ` Theodore Tso
2007-04-09 11:09 ` Jörn Engel
2007-04-09 12:29 ` Trond Myklebust
2007-04-09 12:31 ` Trond Myklebust
2007-04-09 13:19 ` Theodore Tso
2007-04-09 14:03 ` Trond Myklebust
2007-04-09 16:34 ` Jan Engelhardt
2007-04-09 17:00 ` Trond Myklebust
2007-04-10 13:56 ` Theodore Tso
2007-04-10 14:10 ` Ulrich Drepper
2007-04-10 15:48 ` H. Peter Anvin
2007-04-10 16:42 ` Ulrich Drepper
2007-04-10 14:37 ` Trond Myklebust
2007-04-10 15:54 ` Jan Engelhardt
2007-04-10 16:18 ` H. Peter Anvin
2007-04-10 16:25 ` Valdis.Kletnieks
2007-04-10 21:12 ` Neil Brown
2007-04-10 21:16 ` H. Peter Anvin
2007-04-10 21:43 ` Neil Brown
2007-04-10 21:18 ` Trond Myklebust
2007-04-10 21:37 ` Neil Brown
2007-04-10 21:57 ` Bob Copeland
2007-04-10 21:59 ` Trond Myklebust
2007-04-10 22:33 ` Neil Brown
2007-04-11 0:22 ` Trond Myklebust
2007-04-11 1:45 ` Bernd Eckenfels
2007-04-10 21:46 ` Alan Cox
2007-04-10 21:26 ` Neil Brown
2007-04-09 12:46 ` Andreas Schwab
2007-04-10 21:15 ` Neil Brown
2007-04-11 13:57 ` Jan Engelhardt
2007-04-11 14:42 ` Theodore Tso
2007-04-11 22:32 ` Neil Brown
2007-04-11 22:06 ` David Lang
2007-04-11 23:23 ` H. Peter Anvin
2007-04-11 23:33 ` Jörn Engel
2007-04-12 0:00 ` Neil Brown
2007-04-11 23:22 ` Theodore Tso
2007-04-12 1:46 ` Neil Brown
2007-04-12 2:37 ` Jörn Engel
2007-04-12 5:57 ` Neil Brown
2007-04-12 9:33 ` Jörn Engel
2007-04-12 12:21 ` Theodore Tso
2007-04-12 17:18 ` J. Bruce Fields
2007-04-12 17:35 ` H. Peter Anvin
2007-04-16 3:05 ` Theodore Tso
2007-04-16 5:47 ` Neil Brown
2007-04-16 10:39 ` Theodore Tso
2007-04-16 6:18 ` Neil Brown
2007-04-16 11:07 ` Theodore Tso
2007-04-16 23:24 ` Neil Brown
2007-04-08 18:47 ` Theodore Tso
2007-04-08 19:13 ` H. Peter Anvin
2007-04-08 18:50 ` Ulrich Drepper
2007-04-07 23:44 ` Jan Engelhardt
2007-04-08 20:36 ` J. Bruce Fields
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20070407203633.GA21555@thunk.org \
--to=tytso@mit.edu \
--cc=drepper@gmail.com \
--cc=linux-kernel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox