From mboxrd@z Thu Jan 1 00:00:00 1970 From: Theodore Ts'o Subject: Re: Expected getdents behaviour Date: Fri, 16 Sep 2005 07:58:40 -0400 Message-ID: <20050916115840.GA13144@thunk.org> References: <1126793268.1676.9.camel@imp.csi.cam.ac.uk> <1126793558.1676.15.camel@imp.csi.cam.ac.uk> <1126797460.1676.23.camel@imp.csi.cam.ac.uk> <20050915164110.GA25573@hasse.suse.de> <20050915174658.GA9974@wohnheim.fh-wedel.de> <20050915181946.GH22503@thunk.org> <17194.29785.236460.832682@gargle.gargle.HOWL> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: Jan Blunck , Anton Altaparmakov , miklos@szeredi.hu, aaranya@cs.sunysb.edu, linux-fsdevel@vger.kernel.org Return-path: Received: from thunk.org ([69.25.196.29]:3737 "EHLO thunker.thunk.org") by vger.kernel.org with ESMTP id S932663AbVIPL65 (ORCPT ); Fri, 16 Sep 2005 07:58:57 -0400 To: Nikita Danilov Content-Disposition: inline In-Reply-To: <17194.29785.236460.832682@gargle.gargle.HOWL> Sender: linux-fsdevel-owner@vger.kernel.org List-Id: linux-fsdevel.vger.kernel.org On Fri, Sep 16, 2005 at 11:29:29AM +0400, Nikita Danilov wrote: > > Actually, no. What we do is return the directory entries in hash sort > > order, by walking the btree. And we use the hash as the offset > > returned in d_off and via telldir(). The reason for this? So that if > > we add files which causes a node to split, that we still only return > > files in the directory once and only once. If we traversed the tree > > Except for hash collided directory entries, that are returned multiple > times, because after seekdir() ext2_readdir() restarts from the start of > hash-bucket, right? Yes, it's not perfect. But if you don't use telldir()/seekdir(), readdir() will return files in a directory once and only once. Actually, there's another exception, but that only happens when you have a hash collision, readdir() is in the middle of returning multiple directory entries all with the same hash, and at that moment the node containing the hash collisions gets split. This is quite rare, and pretty much impossible to trigger deliberately --- because the hash algorithm uses a filesystem specific secret to prevent attackers from deliberately creating files that could cause a hash collision, and potentially cause applications to misbehave. If you do use telldir/seekdir, and f_pos is pointing at a multiple directories with the same hash, then yes, you could get some repeats. But that's the best we can do given the horrific POSIX interface. You can solve the problem 100% perfectly if you adjust the filesystem format so there is a separate b-tree maintained solely for the purpose of keeping the telldir/seekdir indexes unique, and so you can traverse the tree in telldir index order. But of course the downside of that any operation that modifies the directory has extra overhead because there is an extra b-tree on disk that has to be kept up to date. If you have 64-bit telldir() cookies, and can count on being able to use 64-bit cookies for NFS, this also makes life easier (but still not 100% perfect); but there are a lot of 32-bit only systems still left in the world. - Ted