public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Bill Crawford <bill@ops.netcom.net.uk>
To: Pavel Machek <pavel@suse.cz>
Cc: Bill Crawford <billc@netcomuk.co.uk>,
	Linux Kernel <linux-kernel@vger.kernel.org>,
	"H. Peter Anvin" <hpa@transmeta.com>,
	Daniel Phillips <phillips@innominate.de>
Subject: Re: Hashing and directories
Date: Sat, 03 Mar 2001 00:03:38 +0000	[thread overview]
Message-ID: <3AA034DA.1C3ADA41@ops.netcom.net.uk> (raw)
In-Reply-To: <3A959BFD.B18F833@netcomuk.co.uk> <20000101020213.D28@(none)>

Pavel Machek wrote:

> Hi!

> >  I was hoping to point out that in real life, most systems that
> > need to access large numbers of files are already designed to do
> > some kind of hashing, or at least to divide-and-conquer by using
> > multi-level directory structures.

> Yes -- because their workaround kernel slowness.

 Not just kernel ... because we use NFS a lot, directory searching is
a fair bit quicker with smaller directories (especially when looking
manually at things).

> I had to do this kind of hashing because kernel disliked 70000 html
> files (copy of train time tables).

> BTW try rm * with 70000 files in directory -- command line will overflow.

 Sort of my point, again.  There are limits to what is sane.

 Another example I have cited -- our ticketing system -- is a good one.
If there is subdivision, it can be easier to search subsets of the data.
Can you imagine a source tree with 10k files, all in one directory?  I
think *people* need subdivision more than the machines do, a lot of the
time.  Another example would be mailboxes ... I have started to build a
hierarchy of mail folders because I have more than a screenful.

> Yes? Easier to type cat timetab1/2345 that can timetab12345? With bigger
> command line size, putting i into *one& directory is definitely easier.

 IMO (strictly my own) it is often easier to have things subdivided.
I have had to split up my archive of linux tarballs and patches because
it was getting too big to vgrep.

> >  A couple of practical examples from work here at Netcom UK (now
> > Ebone :), would be say DNS zone files or user authentication data.
> > We use Solaris and NFS a lot, too, so large directories are a bad
> > thing in general for us, so we tend to subdivide things using a
> > very simple scheme: taking the first letter and then sometimes
> > the second letter or a pair of letters from the filename.  This
> > actually works extremely well in practice, and as mentioned above
> > provides some positive side-effects.

> Positive? Try listing all names that contain "linux" with such case. I'll
> do ls *linux*. You'll need ls */*linux* ?l/inux* li/nux*. Seems ugly to
> me.

 It's not that bad, as we tend to be fairly consistent in a scheme.  I
only have to remember one of those combinations at a time :)

 Anyway, again I apologise for starting or continuing (I forget which)
this thread.  I really do understand (and agree with) the arguments for
better directory performance.  I have moved to ReiserFS, mainly for the
avoidance of long fsck (power failure, children pushing buttons, alpha
and beta testing of 3D graphics drivers).  I *love* being able to type
"rm -rf linux-x.y.z-acNN" and have the command prompt reappear in less
than a second.  I intended merely to highlight the danger inherent in
saying to people "oh look you can put a million entries in a directory
now" :)

 *whack* bad thread *die* *die*

>                                                                 Pavel

-- 
/* Bill Crawford, Unix Systems Developer, Ebone (formerly GTS Netcom) */
#include <stddiscl>
const char *addresses[] = {
    "bill@syseng.netcom.net.uk", "Bill.Crawford@ebone.com",     // work
    "billc@netcomuk.co.uk", "bill@eb0ne.net"                    // home
};

  parent reply	other threads:[~2001-03-03  0:04 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2001-02-22 23:08 Hashing and directories Bill Crawford
2000-01-01  2:02 ` Pavel Machek
2001-03-01 20:54   ` Alexander Viro
2001-03-01 21:05     ` H. Peter Anvin
2001-03-01 21:13       ` Alexander Viro
2001-03-01 21:24         ` H. Peter Anvin
2001-03-02  9:04         ` Pavel Machek
2001-03-02 12:01           ` Oystein Viggen
2001-03-02 12:26             ` Tobias Ringstrom
2001-03-02 12:58           ` David Weinehall
2001-03-02 19:33           ` Tim Wright
2001-03-12 10:05           ` Herbert Xu
2001-03-12 10:43             ` Xavier Bestel
2001-03-01 21:23       ` Andreas Dilger
2001-03-01 21:26       ` Bill Crawford
2001-03-01 21:05     ` Tigran Aivazian
2001-03-02  8:56       ` Pavel Machek
2001-03-07  0:37         ` Jamie Lokier
2001-03-07  4:03           ` Linus Torvalds
2001-03-07 13:41             ` Jamie Lokier
2001-03-02  9:00     ` Pavel Machek
2001-03-03  0:03   ` Bill Crawford [this message]
2001-03-08 12:42   ` Goswin Brederlow
2001-04-27 16:20     ` Daniel Phillips
2001-02-22 23:22 ` H. Peter Anvin
2001-02-22 23:54   ` Bill Crawford
2001-03-10 11:22 ` Kai Henningsen
  -- strict thread matches above, loose matches on Subject: below --
2001-03-07 15:56 Manfred Spraul
2001-03-07 16:10 ` Jamie Lokier
2001-03-07 16:23   ` Manfred Spraul
2001-03-07 18:21     ` Linus Torvalds

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3AA034DA.1C3ADA41@ops.netcom.net.uk \
    --to=bill@ops.netcom.net.uk \
    --cc=billc@netcomuk.co.uk \
    --cc=hpa@transmeta.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=pavel@suse.cz \
    --cc=phillips@innominate.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox