public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Bill Crawford <billc@netcomuk.co.uk>
To: Linux Kernel <linux-kernel@vger.kernel.org>
Cc: "H. Peter Anvin" <hpa@transmeta.com>,
	Daniel Phillips <phillips@innominate.de>
Subject: Hashing and directories
Date: Thu, 22 Feb 2001 23:08:45 +0000	[thread overview]
Message-ID: <3A959BFD.B18F833@netcomuk.co.uk> (raw)

 I was hoping to point out that in real life, most systems that
need to access large numbers of files are already designed to do
some kind of hashing, or at least to divide-and-conquer by using
multi-level directory structures.

 A particular reason for this, apart from filesystem efficiency,
is to make it easier for people to find things, as it is usually
easier to spot what you want amongst a hundred things than among
a thousand or ten thousand.

 A couple of practical examples from work here at Netcom UK (now
Ebone :), would be say DNS zone files or user authentication data.
We use Solaris and NFS a lot, too, so large directories are a bad
thing in general for us, so we tend to subdivide things using a
very simple scheme: taking the first letter and then sometimes
the second letter or a pair of letters from the filename.  This
actually works extremely well in practice, and as mentioned above
provides some positive side-effects.

 So I don't think it would actually be sensible to encourage
anyone to use massive directories for too many tasks.  It has a
fairly unfortunate impact on applying human intervention to a
broken system, for example, if it takes a long time to find a
file you're looking for.

 I guess what I really mean is that I think Linus' strategy of
generally optimizing for the "usual case" is a good thing.  It
is actually quite annoying in general to have that many files in
a single directory (think \winnt\... here).  So maybe it would
be better to focus on the normal situation of, say, a few hundred
files in a directory rather than thousands ...

 I still think it's a good idea to do anything you can to speed
up large directory operations on ext2 though :)

 On the plus side, hashes or anything resembling tree structures
would tend to improve the characteristics of insertion and removal
of entries on even moderately sized directories, which would
probably provide a net gain for many folks.

-- 
/* Bill Crawford, Unix Systems Developer, ebOne, formerly GTS Netcom */
#include "stddiscl.h"

             reply	other threads:[~2001-02-22 23:08 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2001-02-22 23:08 Bill Crawford [this message]
2000-01-01  2:02 ` Hashing and directories Pavel Machek
2001-03-01 20:54   ` Alexander Viro
2001-03-01 21:05     ` H. Peter Anvin
2001-03-01 21:13       ` Alexander Viro
2001-03-01 21:24         ` H. Peter Anvin
2001-03-02  9:04         ` Pavel Machek
2001-03-02 12:01           ` Oystein Viggen
2001-03-02 12:26             ` Tobias Ringstrom
2001-03-02 12:58           ` David Weinehall
2001-03-02 19:33           ` Tim Wright
2001-03-12 10:05           ` Herbert Xu
2001-03-12 10:43             ` Xavier Bestel
2001-03-01 21:23       ` Andreas Dilger
2001-03-01 21:26       ` Bill Crawford
2001-03-01 21:05     ` Tigran Aivazian
2001-03-02  8:56       ` Pavel Machek
2001-03-07  0:37         ` Jamie Lokier
2001-03-07  4:03           ` Linus Torvalds
2001-03-07 13:41             ` Jamie Lokier
2001-03-02  9:00     ` Pavel Machek
2001-03-03  0:03   ` Bill Crawford
2001-03-08 12:42   ` Goswin Brederlow
2001-04-27 16:20     ` Daniel Phillips
2001-02-22 23:22 ` H. Peter Anvin
2001-02-22 23:54   ` Bill Crawford
2001-03-10 11:22 ` Kai Henningsen
  -- strict thread matches above, loose matches on Subject: below --
2001-03-07 15:56 Manfred Spraul
2001-03-07 16:10 ` Jamie Lokier
2001-03-07 16:23   ` Manfred Spraul
2001-03-07 18:21     ` Linus Torvalds

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3A959BFD.B18F833@netcomuk.co.uk \
    --to=billc@netcomuk.co.uk \
    --cc=hpa@transmeta.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=phillips@innominate.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox