public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: wli@parcelfarce.linux.theplanet.co.uk
To: andrea@suse.de
Cc: "Richard B. Johnson" <root@chaos.analogic.com>,
	linux-kernel@vger.kernel.org, riel@surriel.com,
	hch@infradead.org, phillips@bonn-fries.net
Subject: Re: 2.4.19pre2aa1
Date: Tue, 12 Mar 2002 04:19:58 +0000	[thread overview]
Message-ID: <20020312041958.C687@holomorphy.com> (raw)

On Fri, Mar 08, 2002 at 01:22:39AM +0100, Andrea Arcangeli wrote:
> -	"i" is the random not predictable input
> -	"i" is generated by dropping the non random bits of the "page"
> 		with a cute trick that drops the bit with
> 		a shiftright with an immediate argument, it's not a
> 		division so it's faster it could been
> 		"page/sizeof(struct page)" but it would been slower
> -	s(i) takes into account not just the last "wait_table->shift" bytes
> 	of the random information in "i", but also the next "shift"
> 	bytes, so it's more random even if some of those lower or higher
> 	bits is generating patterns for whatever reason
> -	finally you mask out the resulting random information s(i) with
> 	size-1, so you fit into the wait_table->head memory.
> The result is that it exploits most of the random input, and physically
> consecutive pages are liekly to use the same cacheline in case there is
> some pattern. That looks a good algorithm to me.

I'm sorry, this reasoning is too flawed to address in detail. I'm a
programmer, not a schoolteacher. I'll drop a hint: "randomness" is not
the right word to use. The way to make a meaningful statement is a
statistical measurement of how close the resulting bucket distributions
are to uniform, as uniform distributions across buckets minimize collisions.
I suggest the usual chi^2 and Kolmogorov-Smirnov (and variants) tests.

The results of these tests are simple: they will compute the
probability that the difference between the measured distribution and a
true uniform distribution could not have occurred by chance (i.e.
whether its difference from the uniform distribution is statistically
significant). With these things in hand you should be able to make more
meaningful assertions about your hashing algorithms.


Cheers,
Bill

P.S.:	The quality (or lack thereof) of this hash function is already
	visible in histograms of the pagecache bucket distribution on
	larger systems. Statistical measurements of its quality reflect
	this very clearly as well. Someone needs to move the stable out
	of the Sahara and closer to a lake.

	http://www.samba.org/~anton/linux/pagecache/pagecache_before.png

	is a histogram of the pagecache hash function's bucket distribution
	on an SMP ppc64 machine after some benchmark run.

	http://www.samba.org/~anton/linux/pagecache/pagecache_after.png

	has a histogram of a Fibonacci hashing hash function's bucket
	distribution on the same machine after an identical benchmark run.

	(There is more to come from the hash table analysis front.)

             reply	other threads:[~2002-03-12  4:20 UTC|newest]

Thread overview: 53+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2002-03-12  4:19 wli [this message]
2002-03-12  5:31 ` 2.4.19pre2aa1 wli
2002-03-12  6:36   ` 2.4.19pre2aa1 Andrea Arcangeli
2002-03-12  6:06 ` 2.4.19pre2aa1 Andrea Arcangeli
2002-03-12 10:46   ` 2.4.19pre2aa1 Rik van Riel
2002-03-12 11:47     ` 2.4.19pre2aa1 Andrea Arcangeli
2002-03-12 11:48       ` 2.4.19pre2aa1 wli
2002-03-12 12:21       ` 2.4.19pre2aa1 Daniel Phillips
2002-03-12 14:25         ` 2.4.19pre2aa1 Andrea Arcangeli
2002-03-12 14:32           ` 2.4.19pre2aa1 Daniel Phillips
2002-03-15 17:20           ` 2.4.19pre2aa1 Horst von Brand
2002-03-15 16:43             ` 2.4.19pre2aa1 Andrea Arcangeli
2002-03-12 11:29   ` 2.4.19pre2aa1 wli
2002-03-12 12:56     ` 2.4.19pre2aa1 Andrea Arcangeli
2002-03-12 13:20       ` 2.4.19pre2aa1 Rik van Riel
2002-03-12 13:33       ` 2.4.19pre2aa1 Richard B. Johnson
2002-03-12 14:17         ` 2.4.19pre2aa1 wli
2002-03-12 14:30           ` 2.4.19pre2aa1 Richard B. Johnson
2002-03-13  2:18         ` 2.4.19pre2aa1 wli
2002-03-13 19:06           ` 2.4.19pre2aa1 Richard B. Johnson
2002-03-13 22:10             ` 2.4.19pre2aa1 wli
2002-03-14 12:18               ` 2.4.19pre2aa1 Richard B. Johnson
2002-03-14 12:47                 ` 2.4.19pre2aa1 Daniel Phillips
2002-03-14 13:59                 ` 2.4.19pre2aa1 Rik van Riel
2002-03-14 14:02                   ` 2.4.19pre2aa1 Daniel Phillips
2002-03-12 14:14       ` 2.4.19pre2aa1 wli
2002-03-12 15:04         ` 2.4.19pre2aa1 Andrea Arcangeli
2002-03-12 23:31           ` 2.4.19pre2aa1 wli
2002-03-13  0:09             ` 2.4.19pre2aa1 Andrew Morton
2002-03-13  1:06               ` 2.4.19pre2aa1 wli
2002-03-13  1:24                 ` 2.4.19pre2aa1 Andrew Morton
2002-03-13  7:37                   ` 2.4.19pre2aa1 Daniel Phillips
2002-03-13  7:30               ` 2.4.19pre2aa1 Andrea Arcangeli
2002-03-13  7:55                 ` 2.4.19pre2aa1 Andrew Morton
2002-03-13  8:06                   ` 2.4.19pre2aa1 Andrea Arcangeli
2002-03-13 10:57                 ` 2.4.19pre2aa1 Andrea Arcangeli
2002-03-13 13:51                   ` 2.4.19pre2aa1 Rik van Riel
2002-03-13 14:03                     ` 2.4.19pre2aa1 Andrea Arcangeli
2002-03-13 19:19                     ` 2.4.19pre2aa1 Andrew Morton
2002-03-13  8:12               ` 2.4.19pre2aa1 Daniel Phillips
  -- strict thread matches above, loose matches on Subject: below --
2002-03-08  2:40 2.4.19pre2aa1 rwhron
2002-03-07  8:21 2.4.19pre2aa1 Andrea Arcangeli
2002-03-07 10:49 ` 2.4.19pre2aa1 William Lee Irwin III
2002-03-07 11:27   ` 2.4.19pre2aa1 Daniel Phillips
2002-03-07 11:47     ` 2.4.19pre2aa1 William Lee Irwin III
2002-03-07 11:46       ` 2.4.19pre2aa1 Daniel Phillips
2002-03-07 17:03   ` 2.4.19pre2aa1 Andrea Arcangeli
2002-03-07 20:18     ` 2.4.19pre2aa1 William Lee Irwin III
2002-03-07 20:38       ` 2.4.19pre2aa1 Richard B. Johnson
2002-03-08  0:22         ` 2.4.19pre2aa1 Andrea Arcangeli
2002-03-08  0:26           ` 2.4.19pre2aa1 Rik van Riel
2002-03-08  0:11       ` 2.4.19pre2aa1 Andrea Arcangeli
2002-03-07 11:34 ` 2.4.19pre2aa1 Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20020312041958.C687@holomorphy.com \
    --to=wli@parcelfarce.linux.theplanet.co.uk \
    --cc=andrea@suse.de \
    --cc=hch@infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=phillips@bonn-fries.net \
    --cc=riel@surriel.com \
    --cc=root@chaos.analogic.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox