linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Theodore Ts'o <tytso@mit.edu>
To: Nikita Danilov <nikita@clusterfs.com>
Cc: Jan Blunck <jblunck@suse.de>,
	Anton Altaparmakov <aia21@cam.ac.uk>,
	miklos@szeredi.hu, aaranya@cs.sunysb.edu,
	linux-fsdevel@vger.kernel.org
Subject: Re: Expected getdents behaviour
Date: Fri, 16 Sep 2005 07:58:40 -0400	[thread overview]
Message-ID: <20050916115840.GA13144@thunk.org> (raw)
In-Reply-To: <17194.29785.236460.832682@gargle.gargle.HOWL>

On Fri, Sep 16, 2005 at 11:29:29AM +0400, Nikita Danilov wrote:
>  > Actually, no.  What we do is return the directory entries in hash sort
>  > order, by walking the btree.  And we use the hash as the offset
>  > returned in d_off and via telldir().  The reason for this?  So that if
>  > we add files which causes a node to split, that we still only return
>  > files in the directory once and only once.  If we traversed the tree
> 
> Except for hash collided directory entries, that are returned multiple
> times, because after seekdir() ext2_readdir() restarts from the start of
> hash-bucket, right?

Yes, it's not perfect.  But if you don't use telldir()/seekdir(),
readdir() will return files in a directory once and only once.   

Actually, there's another exception, but that only happens when you
have a hash collision, readdir() is in the middle of returning
multiple directory entries all with the same hash, and at that moment
the node containing the hash collisions gets split.  This is quite
rare, and pretty much impossible to trigger deliberately --- because
the hash algorithm uses a filesystem specific secret to prevent
attackers from deliberately creating files that could cause a hash
collision, and potentially cause applications to misbehave.

If you do use telldir/seekdir, and f_pos is pointing at a multiple
directories with the same hash, then yes, you could get some repeats.
But that's the best we can do given the horrific POSIX interface.  You
can solve the problem 100% perfectly if you adjust the filesystem
format so there is a separate b-tree maintained solely for the purpose
of keeping the telldir/seekdir indexes unique, and so you can traverse
the tree in telldir index order.  But of course the downside of that
any operation that modifies the directory has extra overhead because
there is an extra b-tree on disk that has to be kept up to date.

If you have 64-bit telldir() cookies, and can count on being able to
use 64-bit cookies for NFS, this also makes life easier (but still not
100% perfect); but there are a lot of 32-bit only systems still left
in the world.

						- Ted

  reply	other threads:[~2005-09-16 11:58 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-09-15 13:57 Expected getdents behaviour Akshat Aranya
2005-09-15 14:03 ` Peter Staubach
2005-09-15 14:07 ` Anton Altaparmakov
2005-09-15 14:12   ` Anton Altaparmakov
2005-09-15 14:45     ` Miklos Szeredi
2005-09-15 15:17       ` Anton Altaparmakov
2005-09-15 16:41         ` Jan Blunck
2005-09-15 17:46           ` Jörn Engel
2005-09-15 18:19             ` Theodore Ts'o
2005-09-15 21:04               ` Anton Altaparmakov
2005-09-16  7:50                 ` Nikita Danilov
2005-09-15 21:47               ` Jörn Engel
2005-09-16  7:29               ` Nikita Danilov
2005-09-16 11:58                 ` Theodore Ts'o [this message]
2005-09-15 21:00             ` Anton Altaparmakov
2005-09-15 21:15               ` Charles P. Wright
2005-09-15 21:19                 ` Anton Altaparmakov
2005-09-15 20:28           ` Anton Altaparmakov
2005-09-15 16:51         ` Miklos Szeredi
2005-09-15 21:17           ` Anton Altaparmakov
2005-09-15 15:51     ` Theodore Ts'o
2005-09-15 16:52       ` Bryan Henderson
2005-09-15 16:57         ` Jeremy Allison
2005-09-15 20:51           ` Anton Altaparmakov
2005-09-15 20:50         ` Anton Altaparmakov
2005-09-15 23:41           ` Bryan Henderson
2005-09-15 20:25       ` Anton Altaparmakov
2005-09-16  3:39         ` Theodore Ts'o
2005-09-16 11:57           ` Dave Kleikamp
2005-09-15 18:08     ` Nikita Danilov
2005-09-16 11:23       ` Miklos Szeredi
2005-09-16  1:28   ` tridge

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20050916115840.GA13144@thunk.org \
    --to=tytso@mit.edu \
    --cc=aaranya@cs.sunysb.edu \
    --cc=aia21@cam.ac.uk \
    --cc=jblunck@suse.de \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=miklos@szeredi.hu \
    --cc=nikita@clusterfs.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).