From: Bill Crawford <billc@netcomuk.co.uk>
To: "H. Peter Anvin" <hpa@transmeta.com>
Cc: Linux Kernel <linux-kernel@vger.kernel.org>
Subject: Re: Hashing and directories
Date: Thu, 22 Feb 2001 23:54:08 +0000 [thread overview]
Message-ID: <3A95A6A0.6E191F2C@netcomuk.co.uk> (raw)
In-Reply-To: <3A959BFD.B18F833@netcomuk.co.uk> <3A959F35.A99CEEEC@transmeta.com>
"H. Peter Anvin" wrote:
> Bill Crawford wrote:
...
> > We use Solaris and NFS a lot, too, so large directories are a bad
> > thing in general for us, so we tend to subdivide things using a
> > very simple scheme: taking the first letter and then sometimes
> > the second letter or a pair of letters from the filename. This
> > actually works extremely well in practice, and as mentioned above
> > provides some positive side-effects.
> This is sometimes feasible, but sometimes it is a hack with painful
> consequences in the form of software incompatibilites.
*grin*
We did change the scheme between different versions of our local
software, and that caused one or two small nightmares for me and a
couple other guys who were developing/maintaining systems here.
...
I don't mind improving performance on big directories -- Solaris
sucks when listing a large directory, for example, but is is rock
solid, which is important where we use it.
My worry is that old thing about giving people enough rope to hang
themselves; I'm humanitarian enough that I don't like doing that.
In other words, if we go out and tell people they can put millions
of files in a directory on Linux+ext2, they'll do it, and then they
are going to be upset because 'ls -l' takes a few minutes :)
> > I guess what I really mean is that I think Linus' strategy of
> > generally optimizing for the "usual case" is a good thing. It
> > is actually quite annoying in general to have that many files in
> > a single directory (think \winnt\... here). So maybe it would
> > be better to focus on the normal situation of, say, a few hundred
> > files in a directory rather than thousands ...
> Linus' strategy is to not let optimizations for uncommon cases inflict
> the common case. However, I think we can make an improvement here that
> will work well even on moderate-sized directories.
That's a good point ... I have mis-stated Linus' intention.
I guess he may be along to tick me off in a minute :)
I have no quibbles with that at all ... improvements to the
general case never hurt, even if the greater gain is elsewhere ...
> My main problem with the fixed-depth tree proposal is that is seems to
> work well for a certain range of directory sizes, but the range seems a
> bit arbitrary. The case of very small directories is also quite
> important, too.
Yup.
Sounds like a pretty good idea, however I would be concerned about
the side-effects of, say, getting a lot of hash collisions from a
pathological data set. Very concerned.
I prefer the idea of a real tree-structure ... ReiserFS already
gives very good performance for searching using find, and "rm -rf"
truly is very fast, and I would actually like the benefits of the
structure without the journalling overhead for some filesystems.
I'm thinking especially of /usr and /usr/src here ...
> -hpa
> "Unix gives you enough rope to shoot yourself in the foot."
Doesn't it just? That was my fear ...
Anyway, 'nuff said, just wanted to comment from my experiences.
> http://www.zytor.com/~hpa/puzzle.txt
--
/* Bill Crawford, Unix Systems Developer, ebOne, formerly GTS Netcom */
#include "stddiscl.h"
next prev parent reply other threads:[~2001-02-22 23:54 UTC|newest]
Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top
2001-02-22 23:08 Hashing and directories Bill Crawford
2000-01-01 2:02 ` Pavel Machek
2001-03-01 20:54 ` Alexander Viro
2001-03-01 21:05 ` H. Peter Anvin
2001-03-01 21:13 ` Alexander Viro
2001-03-01 21:24 ` H. Peter Anvin
2001-03-02 9:04 ` Pavel Machek
2001-03-02 12:01 ` Oystein Viggen
2001-03-02 12:26 ` Tobias Ringstrom
2001-03-02 12:58 ` David Weinehall
2001-03-02 19:33 ` Tim Wright
2001-03-12 10:05 ` Herbert Xu
2001-03-12 10:43 ` Xavier Bestel
2001-03-01 21:23 ` Andreas Dilger
2001-03-01 21:26 ` Bill Crawford
2001-03-01 21:05 ` Tigran Aivazian
2001-03-02 8:56 ` Pavel Machek
2001-03-07 0:37 ` Jamie Lokier
2001-03-07 4:03 ` Linus Torvalds
2001-03-07 13:41 ` Jamie Lokier
2001-03-02 9:00 ` Pavel Machek
2001-03-03 0:03 ` Bill Crawford
2001-03-08 12:42 ` Goswin Brederlow
2001-04-27 16:20 ` Daniel Phillips
2001-02-22 23:22 ` H. Peter Anvin
2001-02-22 23:54 ` Bill Crawford [this message]
2001-03-10 11:22 ` Kai Henningsen
-- strict thread matches above, loose matches on Subject: below --
2001-03-07 15:56 Manfred Spraul
2001-03-07 16:10 ` Jamie Lokier
2001-03-07 16:23 ` Manfred Spraul
2001-03-07 18:21 ` Linus Torvalds
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=3A95A6A0.6E191F2C@netcomuk.co.uk \
--to=billc@netcomuk.co.uk \
--cc=hpa@transmeta.com \
--cc=linux-kernel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox