public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: torvalds@transmeta.com (Linus Torvalds)
To: linux-kernel@vger.kernel.org
Subject: Re: [rfc] Near-constant time directory index for Ext2
Date: 21 Feb 2001 15:59:00 -0800	[thread overview]
Message-ID: <971ko4$1e7$1@penguin.transmeta.com> (raw)
In-Reply-To: <971i36$180$1@penguin.transmeta.com> <XFMail.20010221153438.davidel@xmailserver.org>

In article <XFMail.20010221153438.davidel@xmailserver.org>,
Davide Libenzi  <davidel@xmailserver.org> wrote:
>
>Yep, 4 is not good as a shifting factor. Prime number are the better choice for
>this stuff.

Oh, absolutely.

It looks like the hash function was done rather early on in the dcache
lifetime (one of the first things), back when nobody cared about whether
it was really good or not because there were many much more complicated
questions like "how the h*ll will this all ever work" ;)

And at no point did anybody ever go back and verify whether the hash
function made much sense or not.

We had another boo-boo with the actual _folding_ of the "full" hash
value into the actual hash chain pointer that is done when the name
cache is actually looked up, which was even more embarrassing: even if
the hash ended up being ok, we would remove most of the valid bits from
it because it would under certain circumstances (512MB of RAM on x86)
basically xor itself with itself. 

That took quite a while to find too - the code still worked fine, it
just had a horrible distribution on machines with half a gig of memory.

>The issue to have a good distribution is not only to have a good hashing
>function, but also to give this function not correlated data.
>Good hashing function for a Domain A may not be so good for a Domain B.

This is not something we can do all that much about.  The data we get is
generated by the user, and can basically be a random string of
characters.  HOWEVER, there are certainly tons of _usual_ data, and
while there's no way to select the data we can at least try to make sure
that the distribution is good for the normal case (ie regular ASCII
filenames, not forgetting the fact that many people use more interesting
encodings)

		Linus

  reply	other threads:[~2001-02-22  0:00 UTC|newest]

Thread overview: 80+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2001-02-20 15:04 [rfc] Near-constant time directory index for Ext2 Daniel Phillips
2001-02-20 20:03 ` Linus Torvalds
2001-02-20 21:08   ` Jeremy Jackson
2001-02-20 21:20     ` Mike Dresser
2001-02-20 22:36       ` Jeremy Jackson
2001-02-20 23:08         ` Daniel Phillips
2001-02-21  1:04           ` Bernd Eckenfels
2001-02-21 16:38             ` Daniel Phillips
2001-02-20 22:58       ` Jonathan Morton
2001-02-20 21:41   ` Daniel Phillips
2001-02-21  0:22     ` Linus Torvalds
2001-02-21  0:30       ` Alan Cox
2001-02-21  2:35         ` Ed Tomlinson
2001-02-21 23:13           ` Linus Torvalds
2001-02-21 23:34             ` Davide Libenzi
2001-02-21 23:59               ` Linus Torvalds [this message]
2001-02-21 23:57             ` H. Peter Anvin
2001-02-22  0:35             ` Ed Tomlinson
2001-02-21  1:01       ` Andreas Dilger
2001-02-22  2:28       ` Daniel Phillips
2001-02-22  3:30         ` Linus Torvalds
2001-02-22 16:33           ` Chris Mason
2001-02-22 22:30           ` Daniel Phillips
2001-02-21 17:21 ` Davide Libenzi
2001-02-21 21:08   ` Martin Mares
2001-02-21 21:29     ` Davide Libenzi
2001-02-21 21:32       ` Martin Mares
2001-02-21 21:59         ` Davide Libenzi
2001-02-21 22:26           ` Martin Mares
2001-02-21 22:43             ` Davide Libenzi
2001-02-21 22:14         ` H. Peter Anvin
2001-02-21 22:32           ` Martin Mares
2001-02-21 22:38             ` H. Peter Anvin
2001-02-21 22:50               ` Martin Mares
2001-02-21 22:54                 ` H. Peter Anvin
2001-02-21 23:07                   ` Martin Mares
2001-02-21 23:15                     ` H. Peter Anvin
2001-02-21 23:42                       ` Daniel Phillips
2001-02-21 23:52                         ` Davide Libenzi
     [not found]                       ` <3A945081.E6EB78F4@innominate.de>
2001-02-21 23:48                         ` H. Peter Anvin
2001-02-22  1:22                           ` Daniel Phillips
2001-02-22  1:42                             ` H. Peter Anvin
2001-02-22  2:03                             ` Andreas Dilger
2001-02-22  2:41                               ` H. Peter Anvin
2001-02-22  3:43                                 ` Daniel Phillips
2001-02-22  4:02                                   ` Linus Torvalds
2001-02-22  5:19                                     ` Linus Torvalds
2001-02-22 11:31                                       ` Ingo Oeser
2001-02-22 18:20                                         ` Linus Torvalds
2001-02-22  4:02                                   ` H. Peter Anvin
2001-02-22  7:03                                     ` Andreas Dilger
2001-02-22  4:03                                   ` H. Peter Anvin
2001-02-22 10:35                                     ` Alan Cox
2001-02-23  0:59                                       ` Felix von Leitner
2001-02-22  3:08                               ` Daniel Phillips
2001-02-22  8:06                                 ` [rfc] [LONG] " Andreas Dilger
2001-02-22  7:20                       ` [rfc] " Bill Wendling
2001-02-22  8:34                       ` Rogier Wolff
2001-02-21 23:26                     ` Jamie Lokier
2001-02-22 19:04                     ` Kai Henningsen
2001-02-22  6:23 ` [Ext2-devel] " tytso
2001-02-22  7:24   ` Daniel Phillips
2001-02-22 13:20     ` tytso
2001-02-22 18:16       ` Andreas Dilger
2001-02-22 23:04         ` Daniel Phillips
2001-02-23 20:11           ` tytso
2001-02-24  0:32             ` Andreas Dilger
2001-02-22 23:40         ` tytso
2001-02-22 18:38 ` Kai Henningsen
     [not found] <Pine.LNX.4.10.10102211740550.1933-100000@coffee.psychology.mcmaster.ca>
2001-02-21 22:47 ` H. Peter Anvin
  -- strict thread matches above, loose matches on Subject: below --
2001-02-23  1:52 Andries.Brouwer
2001-02-23 21:43 ` Ralph Loader
2001-02-23 22:37   ` Guest section DW
2001-02-24  2:47     ` Ralph Loader
2001-02-24  5:34     ` Ralph Loader
2001-02-23  2:49 Andries.Brouwer
2001-02-23  3:42 ` Daniel Phillips
2001-02-23 12:20 ` Jonathan Morton
2001-02-23 18:57   ` Andreas Dilger
2001-02-23 12:38 Andries.Brouwer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='971ko4$1e7$1@penguin.transmeta.com' \
    --to=torvalds@transmeta.com \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox