git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Martin Langhoff" <martin.langhoff@gmail.com>
To: "Yannick Gingras" <ygingras@ygingras.net>
Cc: "Linus Torvalds" <torvalds@linux-foundation.org>, git@vger.kernel.org
Subject: Re: On the many files problem
Date: Tue, 1 Jan 2008 12:31:25 +1300	[thread overview]
Message-ID: <46a038f90712311531x17b6d94aua1bac2d3d186f186@mail.gmail.com> (raw)
In-Reply-To: <873atjtbmu.fsf@enceladus.ygingras.net>

On Dec 31, 2007 11:13 PM, Yannick Gingras <ygingras@ygingras.net> wrote:
> >    but if you want to check odder cases, try creating a huge
> >    directory, and then deleting most files, and then adding a few
> >    new ones. Some filesystems will take a huge hit because they'll
> >    still scan the whole directory, even though it's mostly empty!
> >
> >    (Also, a "readdir() + stat()" loop will often get *much* worse access
> >    patterns if you've mixed deletions and creations)
>
> This is something that will be interesting to benchmark later on.  So,
> an application with a lot of turnaround, say a mail server, should
> delete and re-create the directories from time to time?  I assume this
> is specific to some file system types.

This is indeed the case. Directories with a lot of movement get
fragmented on most FSs -- ext3 is a very bad case for this -- and
there are no "directory defrag" tools other than regenarating them.
The "Maildir" storage used for many IMAP servers these days shows the
problem.

This (longish) threads has some interesting tidbits on getdents() and
directory fragmentation.
http://kerneltrap.org/mailarchive/git/2007/1/7/235215

cheers,


m

  parent reply	other threads:[~2007-12-31 23:31 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-12-29 18:22 On the many files problem Yannick Gingras
2007-12-29 19:12 ` Linus Torvalds
2007-12-31 10:13   ` Yannick Gingras
2007-12-31 20:45     ` Linus Torvalds
2007-12-31 23:31     ` Martin Langhoff [this message]
2007-12-29 19:27 ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=46a038f90712311531x17b6d94aua1bac2d3d186f186@mail.gmail.com \
    --to=martin.langhoff@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=ygingras@ygingras.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).