From: Bill Crawford <bill@ops.netcom.net.uk>
To: Pavel Machek <pavel@suse.cz>
Cc: Bill Crawford <billc@netcomuk.co.uk>,
Linux Kernel <linux-kernel@vger.kernel.org>,
"H. Peter Anvin" <hpa@transmeta.com>,
Daniel Phillips <phillips@innominate.de>
Subject: Re: Hashing and directories
Date: Sat, 03 Mar 2001 00:03:38 +0000 [thread overview]
Message-ID: <3AA034DA.1C3ADA41@ops.netcom.net.uk> (raw)
In-Reply-To: <3A959BFD.B18F833@netcomuk.co.uk> <20000101020213.D28@(none)>
Pavel Machek wrote:
> Hi!
> > I was hoping to point out that in real life, most systems that
> > need to access large numbers of files are already designed to do
> > some kind of hashing, or at least to divide-and-conquer by using
> > multi-level directory structures.
> Yes -- because their workaround kernel slowness.
Not just kernel ... because we use NFS a lot, directory searching is
a fair bit quicker with smaller directories (especially when looking
manually at things).
> I had to do this kind of hashing because kernel disliked 70000 html
> files (copy of train time tables).
> BTW try rm * with 70000 files in directory -- command line will overflow.
Sort of my point, again. There are limits to what is sane.
Another example I have cited -- our ticketing system -- is a good one.
If there is subdivision, it can be easier to search subsets of the data.
Can you imagine a source tree with 10k files, all in one directory? I
think *people* need subdivision more than the machines do, a lot of the
time. Another example would be mailboxes ... I have started to build a
hierarchy of mail folders because I have more than a screenful.
> Yes? Easier to type cat timetab1/2345 that can timetab12345? With bigger
> command line size, putting i into *one& directory is definitely easier.
IMO (strictly my own) it is often easier to have things subdivided.
I have had to split up my archive of linux tarballs and patches because
it was getting too big to vgrep.
> > A couple of practical examples from work here at Netcom UK (now
> > Ebone :), would be say DNS zone files or user authentication data.
> > We use Solaris and NFS a lot, too, so large directories are a bad
> > thing in general for us, so we tend to subdivide things using a
> > very simple scheme: taking the first letter and then sometimes
> > the second letter or a pair of letters from the filename. This
> > actually works extremely well in practice, and as mentioned above
> > provides some positive side-effects.
> Positive? Try listing all names that contain "linux" with such case. I'll
> do ls *linux*. You'll need ls */*linux* ?l/inux* li/nux*. Seems ugly to
> me.
It's not that bad, as we tend to be fairly consistent in a scheme. I
only have to remember one of those combinations at a time :)
Anyway, again I apologise for starting or continuing (I forget which)
this thread. I really do understand (and agree with) the arguments for
better directory performance. I have moved to ReiserFS, mainly for the
avoidance of long fsck (power failure, children pushing buttons, alpha
and beta testing of 3D graphics drivers). I *love* being able to type
"rm -rf linux-x.y.z-acNN" and have the command prompt reappear in less
than a second. I intended merely to highlight the danger inherent in
saying to people "oh look you can put a million entries in a directory
now" :)
*whack* bad thread *die* *die*
> Pavel
--
/* Bill Crawford, Unix Systems Developer, Ebone (formerly GTS Netcom) */
#include <stddiscl>
const char *addresses[] = {
"bill@syseng.netcom.net.uk", "Bill.Crawford@ebone.com", // work
"billc@netcomuk.co.uk", "bill@eb0ne.net" // home
};
next prev parent reply other threads:[~2001-03-03 0:04 UTC|newest]
Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top
2001-02-22 23:08 Hashing and directories Bill Crawford
2000-01-01 2:02 ` Pavel Machek
2001-03-01 20:54 ` Alexander Viro
2001-03-01 21:05 ` H. Peter Anvin
2001-03-01 21:13 ` Alexander Viro
2001-03-01 21:24 ` H. Peter Anvin
2001-03-02 9:04 ` Pavel Machek
2001-03-02 12:01 ` Oystein Viggen
2001-03-02 12:26 ` Tobias Ringstrom
2001-03-02 12:58 ` David Weinehall
2001-03-02 19:33 ` Tim Wright
2001-03-12 10:05 ` Herbert Xu
2001-03-12 10:43 ` Xavier Bestel
2001-03-01 21:23 ` Andreas Dilger
2001-03-01 21:26 ` Bill Crawford
2001-03-01 21:05 ` Tigran Aivazian
2001-03-02 8:56 ` Pavel Machek
2001-03-07 0:37 ` Jamie Lokier
2001-03-07 4:03 ` Linus Torvalds
2001-03-07 13:41 ` Jamie Lokier
2001-03-02 9:00 ` Pavel Machek
2001-03-03 0:03 ` Bill Crawford [this message]
2001-03-08 12:42 ` Goswin Brederlow
2001-04-27 16:20 ` Daniel Phillips
2001-02-22 23:22 ` H. Peter Anvin
2001-02-22 23:54 ` Bill Crawford
2001-03-10 11:22 ` Kai Henningsen
-- strict thread matches above, loose matches on Subject: below --
2001-03-07 15:56 Manfred Spraul
2001-03-07 16:10 ` Jamie Lokier
2001-03-07 16:23 ` Manfred Spraul
2001-03-07 18:21 ` Linus Torvalds
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=3AA034DA.1C3ADA41@ops.netcom.net.uk \
--to=bill@ops.netcom.net.uk \
--cc=billc@netcomuk.co.uk \
--cc=hpa@transmeta.com \
--cc=linux-kernel@vger.kernel.org \
--cc=pavel@suse.cz \
--cc=phillips@innominate.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.