From: Theodore Ts'o <tytso@mit.edu>
To: Anton Altaparmakov <aia21@cam.ac.uk>
Cc: Akshat Aranya <aaranya@cs.sunysb.edu>, linux-fsdevel@vger.kernel.org
Subject: Re: Expected getdents behaviour
Date: Thu, 15 Sep 2005 11:51:08 -0400 [thread overview]
Message-ID: <20050915155108.GE22503@thunk.org> (raw)
In-Reply-To: <1126793558.1676.15.camel@imp.csi.cam.ac.uk>
On Thu, Sep 15, 2005 at 03:12:38PM +0100, Anton Altaparmakov wrote:
> Oops. I forgot to answer your question. Yes, the filesystem needs to
> consider the offset value in the second readdir to still be valid. You
> cannot keep rewinding back to zero every time you make a modification or
> you would keep returning entries you have already returned and never
> make any progress if e.g. some user does this in a loop at the same
> time:
POSIX (or SUSv3) does not guarantee the offset data structure to be
the dirent structure at all. So a portable application should not
count of d_off on being present.
That being said, it *is* fair game to assume that an application
should be able to call readdir() repeatedly and get all files in the
directory once and exactly once, even if another process is unlinking
files or adding files while the readdir is going on. The only thing
which is unspecified is whether a file which is deleted or added after
the application has started iterating over the directory will be
included or not. (Think about it; Unix is a multi-user, time-sharing
system. Nothing else makes sense, since otherwise programs that used
readdir() would randomly break if a directory is modified by another
process at the same time.)
In fact, POSIX requires that telldir() and seekdir() do the right
thing even if directory entries are added or deleted between the
telldir() and seekdir(). Yes, this is hard on directories which use
something more sophisticated a simple linked list to store their
directory entries (like a b-tree, for example). However, it is
required by POSIX/SUSv3. The JFS filesystem, for example, uses an
entirely separate b-tree just to guarantee telldir() and seekdir()
indexes behave properly in the presence of file inserts and removals.
> Bonnie++'s code is just complete crap... It is the author's fault that
> it will not work on filesystems where the directory entries are not in
> fixed locations...
If Bonnie++ is relying on d_off, then yet. But in fact, if Bonniee++
is just doing a series of readdir()'s, and the filesystem doesn't do
the right thing in the face of concurrent deletes or file creates, it
is in fact the filesystem which is broken. It doesn't matter if the
filesystem is using a sophisticated b-tree data structure; it still
has to do the right thing. There is a lot of hair in ext3, jfs, xfs,
reiserfs, etc. in order to guarantee this to be the case, since it is
expected by Unix applications, and it is required by the standards
specifications.
(I often curse the POSIX specifiers for including telldir/seekdir into
the standards, since it's hell to support, but it's there, and there
are applications which rely on it --- unfortunately.)
- Ted
next prev parent reply other threads:[~2005-09-15 15:51 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
2005-09-15 13:57 Expected getdents behaviour Akshat Aranya
2005-09-15 14:03 ` Peter Staubach
2005-09-15 14:07 ` Anton Altaparmakov
2005-09-15 14:12 ` Anton Altaparmakov
2005-09-15 14:45 ` Miklos Szeredi
2005-09-15 15:17 ` Anton Altaparmakov
2005-09-15 16:41 ` Jan Blunck
2005-09-15 17:46 ` Jörn Engel
2005-09-15 18:19 ` Theodore Ts'o
2005-09-15 21:04 ` Anton Altaparmakov
2005-09-16 7:50 ` Nikita Danilov
2005-09-15 21:47 ` Jörn Engel
2005-09-16 7:29 ` Nikita Danilov
2005-09-16 11:58 ` Theodore Ts'o
2005-09-15 21:00 ` Anton Altaparmakov
2005-09-15 21:15 ` Charles P. Wright
2005-09-15 21:19 ` Anton Altaparmakov
2005-09-15 20:28 ` Anton Altaparmakov
2005-09-15 16:51 ` Miklos Szeredi
2005-09-15 21:17 ` Anton Altaparmakov
2005-09-15 15:51 ` Theodore Ts'o [this message]
2005-09-15 16:52 ` Bryan Henderson
2005-09-15 16:57 ` Jeremy Allison
2005-09-15 20:51 ` Anton Altaparmakov
2005-09-15 20:50 ` Anton Altaparmakov
2005-09-15 23:41 ` Bryan Henderson
2005-09-15 20:25 ` Anton Altaparmakov
2005-09-16 3:39 ` Theodore Ts'o
2005-09-16 11:57 ` Dave Kleikamp
2005-09-15 18:08 ` Nikita Danilov
2005-09-16 11:23 ` Miklos Szeredi
2005-09-16 1:28 ` tridge
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20050915155108.GE22503@thunk.org \
--to=tytso@mit.edu \
--cc=aaranya@cs.sunysb.edu \
--cc=aia21@cam.ac.uk \
--cc=linux-fsdevel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).