All of lore.kernel.org
 help / color / mirror / Atom feed
From: Eric Sandeen <sandeen@redhat.com>
To: Phillip Susi <psusi@cfl.rr.com>
Cc: "linux-ext4@vger.kernel.org" <linux-ext4@vger.kernel.org>
Subject: Re: Large directories and poor order correlation
Date: Mon, 14 Mar 2011 16:12:49 -0500	[thread overview]
Message-ID: <4D7E84D1.5010504@redhat.com> (raw)
In-Reply-To: <4D7E8005.4030201@cfl.rr.com>

On 3/14/11 3:52 PM, Phillip Susi wrote:
> On 3/14/2011 4:37 PM, Eric Sandeen wrote:
>> On 3/14/11 3:24 PM, Phillip Susi wrote:
>>> Shouldn't copying or extracting or otherwise populating a large
>>> directory of many small files at the same time result in a strong
>>> correlation between the order the names appear in the directory, and the
>>> order their data blocks are stored on disk, and thus, read performance
>>> should not be negatively impacted by fragmentation?
>>
>> No, because htree (dir_index) dirs returns names in hash-value
>> order, not inode number order.  i.e. "at random."
> 
> I thought that the htree was used to look up names, but the normal
> directory was used to enumerate them?  In other words, the htree speeds
> up opening a single file, but slows down traversing the entire
> directory, so should not be used there.

readdir uses htree / dir_index:

ext3_readdir()
        if (EXT3_HAS_COMPAT_FEATURE(inode->i_sb,
                                    EXT3_FEATURE_COMPAT_DIR_INDEX) &&
            ((EXT3_I(inode)->i_flags & EXT3_INDEX_FL) ||
             ((inode->i_size >> sb->s_blocksize_bits) == 1))) {
                err = ext3_dx_readdir(filp, dirent, filldir);

Because dir_index places entries into blocks in hash order, reading
it "like a non-dir_index" dir still gives you out of order entries,
I think.  IOW it doesn't slow down readdir, it just gives you a nasty
order - slowing down access to those files.

> Also isn't htree only enabled for large directories?  I still see crummy
> correlation for small ( < 100 files, even one with only 8 entries )
> directories.

Nope, it's used for all directories AFAIK.  Certainly shows the most
improvement on lookups in large directories though...

> It seems unreasonable to ask applications to read all directory entries,
> then sort them by inode number to achieve reasonable performance.  This
> seems like something the fs should be doing.

Yeah, this has been a longstanding nastiness...

-Eric

  reply	other threads:[~2011-03-14 21:12 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-03-14 20:24 Large directories and poor order correlation Phillip Susi
2011-03-14 20:37 ` Eric Sandeen
2011-03-14 20:52   ` Phillip Susi
2011-03-14 21:12     ` Eric Sandeen [this message]
2011-03-14 21:52     ` Ted Ts'o
2011-03-14 23:43       ` Phillip Susi
2011-03-15  0:14         ` Ted Ts'o
2011-03-15 14:01           ` Phillip Susi
2011-03-15 14:33             ` Rogier Wolff
2011-03-15 14:36               ` Ric Wheeler
2011-03-15 17:08             ` Ted Ts'o
2011-03-15 19:08               ` Phillip Susi
2011-03-16  1:50                 ` Ted Ts'o
2011-03-15  7:59   ` Florian Weimer
2011-03-15 11:06     ` Theodore Tso
2011-03-15 11:23       ` Ric Wheeler
2011-03-15 11:38         ` Theodore Tso
2011-03-15 13:33       ` Rogier Wolff
2011-03-15 17:18         ` Ted Ts'o

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4D7E84D1.5010504@redhat.com \
    --to=sandeen@redhat.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=psusi@cfl.rr.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.