From: Phillip Susi <psusi@cfl.rr.com>
To: Ted Ts'o <tytso@mit.edu>
Cc: Eric Sandeen <sandeen@redhat.com>,
"linux-ext4@vger.kernel.org" <linux-ext4@vger.kernel.org>
Subject: Re: Large directories and poor order correlation
Date: Tue, 15 Mar 2011 10:01:24 -0400 [thread overview]
Message-ID: <4D7F7134.7080209@cfl.rr.com> (raw)
In-Reply-To: <20110315001448.GG8120@thunk.org>
On 3/14/2011 8:14 PM, Ted Ts'o wrote:
> The reason why we have to traverse the directory tree in htree order
> is because the POSIX requirements of how readdir() works in the face
> of file deletes and creations, and what needs to happen if a leaf
> block needs to be split. Even if the readdir() started three months
> ago, if in the intervening time, leaf nodes have been split, readdir()
> is not allowed to return the same file twice.
This would also be fixed by having readdir() traverse the linear
directory entries rather than the htree.
> Well, if the file system has been around for a long time, and there
> are lots of "holes" in the inode allocation bitmap, it can happen that
> even without indexing.
Why is that? Sure, if the inode table is full of small holes I can see
them not being allocated sequentially, but why don't they tend to at
least be allocated in ascending order?
> As another example, if you have a large maildir directory w/o
> indexing, and files get removed, deleted, etc., over time the order of
> the directory entries will have very little to do with the inode
> number. That's why programs like mutt sort the directory entries by
> inode number.
Is this what e2fsck -D fixes? Does it rewrite the directory entries in
inode order? I've been toying with the idea of adding directory
optimization support to e2defrag.
To try and clarify this point a bit, are you saying that applications
like tar and rsync should be patched to sort the directory by inode
number, rather than it being the job of the fs to return entries in a
good order?
next prev parent reply other threads:[~2011-03-15 14:01 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-03-14 20:24 Large directories and poor order correlation Phillip Susi
2011-03-14 20:37 ` Eric Sandeen
2011-03-14 20:52 ` Phillip Susi
2011-03-14 21:12 ` Eric Sandeen
2011-03-14 21:52 ` Ted Ts'o
2011-03-14 23:43 ` Phillip Susi
2011-03-15 0:14 ` Ted Ts'o
2011-03-15 14:01 ` Phillip Susi [this message]
2011-03-15 14:33 ` Rogier Wolff
2011-03-15 14:36 ` Ric Wheeler
2011-03-15 17:08 ` Ted Ts'o
2011-03-15 19:08 ` Phillip Susi
2011-03-16 1:50 ` Ted Ts'o
2011-03-15 7:59 ` Florian Weimer
2011-03-15 11:06 ` Theodore Tso
2011-03-15 11:23 ` Ric Wheeler
2011-03-15 11:38 ` Theodore Tso
2011-03-15 13:33 ` Rogier Wolff
2011-03-15 17:18 ` Ted Ts'o
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4D7F7134.7080209@cfl.rr.com \
--to=psusi@cfl.rr.com \
--cc=linux-ext4@vger.kernel.org \
--cc=sandeen@redhat.com \
--cc=tytso@mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).