public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Daniel Phillips <phillips@innominate.de>
To: Jeremy Jackson <jeremy.jackson@sympatico.ca>,
	Mike Dresser <mdresser@windsormachine.com>
Cc: linux-kernel@vger.kernel.org
Subject: Re: [rfc] Near-constant time directory index for Ext2
Date: Wed, 21 Feb 2001 00:08:35 +0100	[thread overview]
Message-ID: <01022100361408.18944@gimli> (raw)
In-Reply-To: <01022020011905.18944@gimli> <3A92DF84.E39E415C@windsormachine.com> <3A92F17E.BFEDEADD@sympatico.ca>
In-Reply-To: <3A92F17E.BFEDEADD@sympatico.ca>

On Tue, 20 Feb 2001, Jeremy Jackson wrote:
> Mike Dresser wrote:
> 
> > the way i'm reading this, the problem is there's 65535 files in the directory
> > /where/postfix/lives.  rm * or what have you, is going to take forever and
> > ever, and bog the machine down while its doing it.  My understanding is you
> > could do the rm *, and instead of it reading the tree over and over for every
> > file that has to be deleted, it just jumps one or two blocks to the file that's
> > being deleted, instead of thousands of files to be scanned for each file
> > deleted.
> 
> I thought about it again, and the proformance problem with "rm *" is that
> the shell reads and sorts the directory, passes each file as a separate
> argument to rm, which then causes the kernel to lookup each file
> from a random directory block (random because of previous sort),
> modify that directory block, then read another... after a few seconds
> the modified blocks start to be written back to disk while new ones
> are looked up... disk seek contention.  and this becomes hard on the
> dir. block cache (wherever this is) since from source each dir entry
> is just over 256 bytes (?) 65535 files would require 16MB to
> cache dir entries.  Plus it has to read in all the inodes, modify,
> then write, taking up xxMB more.  You're probably swapping
> out,  with swap partition on same disk, the disk may explode.
> 
> If it were truly doing a linear scan, it might be faster.  Two
> successive mods to same dir block would be merged
> onto same write.
> 
> Perhaps rm -rf . would be faster?  Let rm do glob expansion,
> without the sort.  Care to recreate those 65535 files and try it?
> 
> or use ls with the nosort flag pipe through xargs then to rm...
> again loose sorting but don't delete directory or subdirs.

Indeed, rm -rf is faster.  It does a readdir to get all the directory
entries in internal order, then calls unlink to remove them, one at a
time.  This removes each entry from the front of the file, shortening
the time that has to be spent scanning forward in the file to find the
target entry.  Manfred Spraul observed that this could be speeded up
with by caching the file position, and sent me a patch to do that.  It
did speed things up - about 20%.

But actually, rm is not problem, it's open and create.  To do a
create you have to make sure the file doesn't already exist, and
without an index you have to scan on average half the directory file. 
Open requires a similar scan.  Here we are talking about using an index
to speed that up quadraticly when operating on N files.  That is the
real gravy.

-- 
Daniel

  reply	other threads:[~2001-02-20 23:37 UTC|newest]

Thread overview: 80+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2001-02-20 15:04 [rfc] Near-constant time directory index for Ext2 Daniel Phillips
2001-02-20 20:03 ` Linus Torvalds
2001-02-20 21:08   ` Jeremy Jackson
2001-02-20 21:20     ` Mike Dresser
2001-02-20 22:36       ` Jeremy Jackson
2001-02-20 23:08         ` Daniel Phillips [this message]
2001-02-21  1:04           ` Bernd Eckenfels
2001-02-21 16:38             ` Daniel Phillips
2001-02-20 22:58       ` Jonathan Morton
2001-02-20 21:41   ` Daniel Phillips
2001-02-21  0:22     ` Linus Torvalds
2001-02-21  0:30       ` Alan Cox
2001-02-21  2:35         ` Ed Tomlinson
2001-02-21 23:13           ` Linus Torvalds
2001-02-21 23:34             ` Davide Libenzi
2001-02-21 23:59               ` Linus Torvalds
2001-02-21 23:57             ` H. Peter Anvin
2001-02-22  0:35             ` Ed Tomlinson
2001-02-21  1:01       ` Andreas Dilger
2001-02-22  2:28       ` Daniel Phillips
2001-02-22  3:30         ` Linus Torvalds
2001-02-22 16:33           ` Chris Mason
2001-02-22 22:30           ` Daniel Phillips
2001-02-21 17:21 ` Davide Libenzi
2001-02-21 21:08   ` Martin Mares
2001-02-21 21:29     ` Davide Libenzi
2001-02-21 21:32       ` Martin Mares
2001-02-21 21:59         ` Davide Libenzi
2001-02-21 22:26           ` Martin Mares
2001-02-21 22:43             ` Davide Libenzi
2001-02-21 22:14         ` H. Peter Anvin
2001-02-21 22:32           ` Martin Mares
2001-02-21 22:38             ` H. Peter Anvin
2001-02-21 22:50               ` Martin Mares
2001-02-21 22:54                 ` H. Peter Anvin
2001-02-21 23:07                   ` Martin Mares
2001-02-21 23:15                     ` H. Peter Anvin
2001-02-21 23:42                       ` Daniel Phillips
2001-02-21 23:52                         ` Davide Libenzi
     [not found]                       ` <3A945081.E6EB78F4@innominate.de>
2001-02-21 23:48                         ` H. Peter Anvin
2001-02-22  1:22                           ` Daniel Phillips
2001-02-22  1:42                             ` H. Peter Anvin
2001-02-22  2:03                             ` Andreas Dilger
2001-02-22  2:41                               ` H. Peter Anvin
2001-02-22  3:43                                 ` Daniel Phillips
2001-02-22  4:02                                   ` Linus Torvalds
2001-02-22  5:19                                     ` Linus Torvalds
2001-02-22 11:31                                       ` Ingo Oeser
2001-02-22 18:20                                         ` Linus Torvalds
2001-02-22  4:02                                   ` H. Peter Anvin
2001-02-22  7:03                                     ` Andreas Dilger
2001-02-22  4:03                                   ` H. Peter Anvin
2001-02-22 10:35                                     ` Alan Cox
2001-02-23  0:59                                       ` Felix von Leitner
2001-02-22  3:08                               ` Daniel Phillips
2001-02-22  8:06                                 ` [rfc] [LONG] " Andreas Dilger
2001-02-22  7:20                       ` [rfc] " Bill Wendling
2001-02-22  8:34                       ` Rogier Wolff
2001-02-21 23:26                     ` Jamie Lokier
2001-02-22 19:04                     ` Kai Henningsen
2001-02-22  6:23 ` [Ext2-devel] " tytso
2001-02-22  7:24   ` Daniel Phillips
2001-02-22 13:20     ` tytso
2001-02-22 18:16       ` Andreas Dilger
2001-02-22 23:04         ` Daniel Phillips
2001-02-23 20:11           ` tytso
2001-02-24  0:32             ` Andreas Dilger
2001-02-22 23:40         ` tytso
2001-02-22 18:38 ` Kai Henningsen
     [not found] <Pine.LNX.4.10.10102211740550.1933-100000@coffee.psychology.mcmaster.ca>
2001-02-21 22:47 ` H. Peter Anvin
  -- strict thread matches above, loose matches on Subject: below --
2001-02-23  1:52 Andries.Brouwer
2001-02-23 21:43 ` Ralph Loader
2001-02-23 22:37   ` Guest section DW
2001-02-24  2:47     ` Ralph Loader
2001-02-24  5:34     ` Ralph Loader
2001-02-23  2:49 Andries.Brouwer
2001-02-23  3:42 ` Daniel Phillips
2001-02-23 12:20 ` Jonathan Morton
2001-02-23 18:57   ` Andreas Dilger
2001-02-23 12:38 Andries.Brouwer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=01022100361408.18944@gimli \
    --to=phillips@innominate.de \
    --cc=jeremy.jackson@sympatico.ca \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mdresser@windsormachine.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox