linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Ted Ts'o <tytso@mit.edu>
To: Phillip Susi <psusi@cfl.rr.com>
Cc: Eric Sandeen <sandeen@redhat.com>,
	"linux-ext4@vger.kernel.org" <linux-ext4@vger.kernel.org>
Subject: Re: Large directories and poor order correlation
Date: Mon, 14 Mar 2011 17:52:49 -0400	[thread overview]
Message-ID: <20110314215249.GE8120@thunk.org> (raw)
In-Reply-To: <4D7E8005.4030201@cfl.rr.com>

On Mon, Mar 14, 2011 at 04:52:21PM -0400, Phillip Susi wrote:
> It seems unreasonable to ask applications to read all directory entries,
> then sort them by inode number to achieve reasonable performance.  This
> seems like something the fs should be doing.

Unfortunately the kernel can't do it, because a directory could be
arbitrarily big, and kernel memory is non-swappable.  In addition,
what if a process opens a directory, starts calling readdir, pauses in
the middle, and then holds onto it for days, weeks, or months?

Combine that with POSIX requirements about how readdir() has to behave
if files are added or deleted during a readdir() session (even a
readdir session which takes days, weeks, or months), and it's a
complete mess.

It's not hard to provide library routines that do the right thing, and
I have written an LD_PRELOAD which intercepts opendir() and readdir()
calls and does the sorting in userspace.  Perhaps the right answer is
getting this into libc, but I have exactly two words for you: "Uhlrich
Drepper".

					- Ted

  parent reply	other threads:[~2011-03-14 21:52 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-03-14 20:24 Large directories and poor order correlation Phillip Susi
2011-03-14 20:37 ` Eric Sandeen
2011-03-14 20:52   ` Phillip Susi
2011-03-14 21:12     ` Eric Sandeen
2011-03-14 21:52     ` Ted Ts'o [this message]
2011-03-14 23:43       ` Phillip Susi
2011-03-15  0:14         ` Ted Ts'o
2011-03-15 14:01           ` Phillip Susi
2011-03-15 14:33             ` Rogier Wolff
2011-03-15 14:36               ` Ric Wheeler
2011-03-15 17:08             ` Ted Ts'o
2011-03-15 19:08               ` Phillip Susi
2011-03-16  1:50                 ` Ted Ts'o
2011-03-15  7:59   ` Florian Weimer
2011-03-15 11:06     ` Theodore Tso
2011-03-15 11:23       ` Ric Wheeler
2011-03-15 11:38         ` Theodore Tso
2011-03-15 13:33       ` Rogier Wolff
2011-03-15 17:18         ` Ted Ts'o

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110314215249.GE8120@thunk.org \
    --to=tytso@mit.edu \
    --cc=linux-ext4@vger.kernel.org \
    --cc=psusi@cfl.rr.com \
    --cc=sandeen@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).