public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Andries.Brouwer@cwi.nl
To: Linux-kernel@vger.kernel.org, kaih@khms.westfalen.de
Subject: Re: [rfc] Near-constant time directory index for Ext2
Date: Fri, 23 Feb 2001 02:52:48 +0100 (MET)	[thread overview]
Message-ID: <UTC200102230152.CAA138669.aeb@vlet.cwi.nl> (raw)

>> idea of using crc32 as the default hash function

> I once liked those things, too - but I've learned better since.

> A universal class of hashing functions is a class with the property that
> given any input, the average performance of all the functions is good.
> For example, h(k) = (a * k + b) mod m with integers a != 0 and b is
> a universal class of hash functions.

Here a != 0 should be (a,m) = 1.

> for(i=0; *s; s++) i = 131*i + *s;

Yes, that is a good function.

(Also because 131 has only 3 bits, so that there is a cheap shift and add
implementation.)

I did some random tests, on the one hand on a collection of 557398 file
names (last part of path names) in a file system here.
On the other hand on an artificially generated sequence of file names
with increasing number tails: foo0001, foo0002, ...

On the first collection the choice of multiplier didnt really matter
provided that it was odd and not too close to a power of two.
The smallest number with good behaviour was 11, the winner was 51.

(51 has 4 bits, but is not more expensive because they are evenly spaced:
	/* 51x = 17*3*x */
	x += (x << 1);
	x += (x << 4);
)

On the second collection there were large differences between multipliers.
The clear winner was 11.

Some numbers:

Hash 557398 actual names, using
	hash(unsigned char *s) {
	        unsigned int h = 0;

		while(*s)
			h = m*h + *s++;
		return h % sz;
	}
for various values of m and powers of two sz (so that the % is an AND).
Report min, max, average length of hash chain, and standard deviation.
Of course min and max should be close to average and stddev should be small.

m= 11 sz=2048, min 221, max 324, av 272.17, stddev 254.12
m= 13 sz=2048, min 219, max 322, av 272.17, stddev 259.96
m= 51 sz=2048, min 218, max 325, av 272.17, stddev 265.52
m=131 sz=2048, min 222, max 344, av 272.17, stddev 264.20

m= 11 sz=4096, min  96, max 176, av 136.08, stddev 132.58
m= 13 sz=4096, min 101, max 177, av 136.08, stddev 128.71
m= 51 sz=4096, min  92, max 174, av 136.08, stddev 130.89
m=131 sz=4096, min  85, max 180, av 136.08, stddev 131.99

m= 11 sz=8192, min 38, max 102, av 68.04, stddev 68.26
m= 13 sz=8192, min 42, max 100, av 68.04, stddev 66.30
m= 51 sz=8192, min 41, max  97, av 68.04, stddev 64.98
m=131 sz=8192, min 39, max 102, av 68.04, stddev 66.19

m= 11 sz=16384, min 14, max 57, av 34.02, stddev 33.96
m= 13 sz=16384, min 14, max 58, av 34.02, stddev 33.51
m= 51 sz=16384, min 15, max 60, av 34.02, stddev 32.29
m=131 sz=16384, min 16, max 59, av 34.02, stddev 33.94

m= 11 sz=32768, min 3, max 37, av 17.01, stddev 17.50
m= 13 sz=32768, min 3, max 34, av 17.01, stddev 16.84
m= 51 sz=32768, min 4, max 41, av 17.01, stddev 16.46
m=131 sz=32768, min 3, max 40, av 17.01, stddev 16.90

m= 11 sz=65536, min 0, max 24, av 8.51, stddev 8.70
m= 13 sz=65536, min 0, max 23, av 8.51, stddev 8.56
m= 51 sz=65536, min 0, max 24, av 8.51, stddev 8.31
m=131 sz=65536, min 0, max 23, av 8.51, stddev 8.51

m= 11 sz=131072, min 0, max 17, av 4.25, stddev 4.39
m= 13 sz=131072, min 0, max 16, av 4.25, stddev 4.32
m= 51 sz=131072, min 0, max 16, av 4.25, stddev 4.22
m=131 sz=131072, min 0, max 16, av 4.25, stddev 4.24

m= 11 sz=262144, min 0, max 12, av 2.13, stddev 2.20
m= 13 sz=262144, min 0, max 12, av 2.13, stddev 2.18
m= 51 sz=262144, min 0, max 12, av 2.13, stddev 2.12
m=131 sz=262144, min 0, max 12, av 2.13, stddev 2.12

On the second, nonrandom, collection there are more variations:

m= 11 sz=8192, min 61, max 76, av 68.04, stddev  4.41
m= 13 sz=8192, min 55, max 83, av 68.04, stddev 18.64
m= 51 sz=8192, min 58, max 79, av 68.04, stddev 12.47
m=131 sz=8192, min 52, max 83, av 68.04, stddev 29.05

m= 11 sz=16384, min 26, max 41, av 34.02, stddev  3.61
m= 13 sz=16384, min 24, max 45, av 34.02, stddev  8.76
m= 51 sz=16384, min 25, max 44, av 34.02, stddev  6.32
m=131 sz=16384, min 23, max 47, av 34.02, stddev 14.00

m= 11 sz=32768, min 10, max 23, av 17.01, stddev 4.36
m= 13 sz=32768, min  7, max 28, av 17.01, stddev 8.66
m= 51 sz=32768, min 10, max 25, av 17.01, stddev 4.04
m=131 sz=32768, min  6, max 27, av 17.01, stddev 8.66

Andries

             reply	other threads:[~2001-02-23  1:53 UTC|newest]

Thread overview: 71+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2001-02-23  1:52 Andries.Brouwer [this message]
2001-02-23 21:43 ` [rfc] Near-constant time directory index for Ext2 Ralph Loader
2001-02-23 22:37   ` Guest section DW
2001-02-24  2:47     ` Ralph Loader
2001-02-24  5:34     ` Ralph Loader
  -- strict thread matches above, loose matches on Subject: below --
2001-02-23 12:38 Andries.Brouwer
2001-02-23  2:49 Andries.Brouwer
2001-02-23  3:42 ` Daniel Phillips
2001-02-23 12:20 ` Jonathan Morton
2001-02-23 18:57   ` Andreas Dilger
     [not found] <Pine.LNX.4.10.10102211740550.1933-100000@coffee.psychology.mcmaster.ca>
2001-02-21 22:47 ` H. Peter Anvin
2001-02-20 15:04 Daniel Phillips
2001-02-20 20:03 ` Linus Torvalds
2001-02-20 21:08   ` Jeremy Jackson
2001-02-20 21:20     ` Mike Dresser
2001-02-20 22:36       ` Jeremy Jackson
2001-02-20 23:08         ` Daniel Phillips
2001-02-21  1:04           ` Bernd Eckenfels
2001-02-21 16:38             ` Daniel Phillips
2001-02-20 22:58       ` Jonathan Morton
2001-02-20 21:41   ` Daniel Phillips
2001-02-21  0:22     ` Linus Torvalds
2001-02-21  0:30       ` Alan Cox
2001-02-21  2:35         ` Ed Tomlinson
2001-02-21 23:13           ` Linus Torvalds
2001-02-21 23:34             ` Davide Libenzi
2001-02-21 23:59               ` Linus Torvalds
2001-02-21 23:57             ` H. Peter Anvin
2001-02-22  0:35             ` Ed Tomlinson
2001-02-21  1:01       ` Andreas Dilger
2001-02-22  2:28       ` Daniel Phillips
2001-02-22  3:30         ` Linus Torvalds
2001-02-22 16:33           ` Chris Mason
2001-02-22 22:30           ` Daniel Phillips
2001-02-21 17:21 ` Davide Libenzi
2001-02-21 21:08   ` Martin Mares
2001-02-21 21:29     ` Davide Libenzi
2001-02-21 21:32       ` Martin Mares
2001-02-21 21:59         ` Davide Libenzi
2001-02-21 22:26           ` Martin Mares
2001-02-21 22:43             ` Davide Libenzi
2001-02-21 22:14         ` H. Peter Anvin
2001-02-21 22:32           ` Martin Mares
2001-02-21 22:38             ` H. Peter Anvin
2001-02-21 22:50               ` Martin Mares
2001-02-21 22:54                 ` H. Peter Anvin
2001-02-21 23:07                   ` Martin Mares
2001-02-21 23:15                     ` H. Peter Anvin
2001-02-21 23:42                       ` Daniel Phillips
2001-02-21 23:52                         ` Davide Libenzi
     [not found]                       ` <3A945081.E6EB78F4@innominate.de>
2001-02-21 23:48                         ` H. Peter Anvin
2001-02-22  1:22                           ` Daniel Phillips
2001-02-22  1:42                             ` H. Peter Anvin
2001-02-22  2:03                             ` Andreas Dilger
2001-02-22  2:41                               ` H. Peter Anvin
2001-02-22  3:43                                 ` Daniel Phillips
2001-02-22  4:02                                   ` Linus Torvalds
2001-02-22  5:19                                     ` Linus Torvalds
2001-02-22 11:31                                       ` Ingo Oeser
2001-02-22 18:20                                         ` Linus Torvalds
2001-02-22  4:02                                   ` H. Peter Anvin
2001-02-22  7:03                                     ` Andreas Dilger
2001-02-22  4:03                                   ` H. Peter Anvin
2001-02-22 10:35                                     ` Alan Cox
2001-02-23  0:59                                       ` Felix von Leitner
2001-02-22  3:08                               ` Daniel Phillips
2001-02-22  7:20                       ` Bill Wendling
2001-02-22  8:34                       ` Rogier Wolff
2001-02-21 23:26                     ` Jamie Lokier
2001-02-22 19:04                     ` Kai Henningsen
2001-02-22 18:38 ` Kai Henningsen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=UTC200102230152.CAA138669.aeb@vlet.cwi.nl \
    --to=andries.brouwer@cwi.nl \
    --cc=Linux-kernel@vger.kernel.org \
    --cc=kaih@khms.westfalen.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox