From mboxrd@z Thu Jan 1 00:00:00 1970 From: Theodore Ts'o Subject: Re: Expected getdents behaviour Date: Thu, 15 Sep 2005 11:51:08 -0400 Message-ID: <20050915155108.GE22503@thunk.org> References: <1126793268.1676.9.camel@imp.csi.cam.ac.uk> <1126793558.1676.15.camel@imp.csi.cam.ac.uk> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: Akshat Aranya , linux-fsdevel@vger.kernel.org Return-path: Received: from thunk.org ([69.25.196.29]:53647 "EHLO thunker.thunk.org") by vger.kernel.org with ESMTP id S1030424AbVIOPvV (ORCPT ); Thu, 15 Sep 2005 11:51:21 -0400 To: Anton Altaparmakov Content-Disposition: inline In-Reply-To: <1126793558.1676.15.camel@imp.csi.cam.ac.uk> Sender: linux-fsdevel-owner@vger.kernel.org List-Id: linux-fsdevel.vger.kernel.org On Thu, Sep 15, 2005 at 03:12:38PM +0100, Anton Altaparmakov wrote: > Oops. I forgot to answer your question. Yes, the filesystem needs to > consider the offset value in the second readdir to still be valid. You > cannot keep rewinding back to zero every time you make a modification or > you would keep returning entries you have already returned and never > make any progress if e.g. some user does this in a loop at the same > time: POSIX (or SUSv3) does not guarantee the offset data structure to be the dirent structure at all. So a portable application should not count of d_off on being present. That being said, it *is* fair game to assume that an application should be able to call readdir() repeatedly and get all files in the directory once and exactly once, even if another process is unlinking files or adding files while the readdir is going on. The only thing which is unspecified is whether a file which is deleted or added after the application has started iterating over the directory will be included or not. (Think about it; Unix is a multi-user, time-sharing system. Nothing else makes sense, since otherwise programs that used readdir() would randomly break if a directory is modified by another process at the same time.) In fact, POSIX requires that telldir() and seekdir() do the right thing even if directory entries are added or deleted between the telldir() and seekdir(). Yes, this is hard on directories which use something more sophisticated a simple linked list to store their directory entries (like a b-tree, for example). However, it is required by POSIX/SUSv3. The JFS filesystem, for example, uses an entirely separate b-tree just to guarantee telldir() and seekdir() indexes behave properly in the presence of file inserts and removals. > Bonnie++'s code is just complete crap... It is the author's fault that > it will not work on filesystems where the directory entries are not in > fixed locations... If Bonnie++ is relying on d_off, then yet. But in fact, if Bonniee++ is just doing a series of readdir()'s, and the filesystem doesn't do the right thing in the face of concurrent deletes or file creates, it is in fact the filesystem which is broken. It doesn't matter if the filesystem is using a sophisticated b-tree data structure; it still has to do the right thing. There is a lot of hair in ext3, jfs, xfs, reiserfs, etc. in order to guarantee this to be the case, since it is expected by Unix applications, and it is required by the standards specifications. (I often curse the POSIX specifiers for including telldir/seekdir into the standards, since it's hell to support, but it's there, and there are applications which rely on it --- unfortunately.) - Ted