From: Mike Fedyk <mfedyk@matchmail.com>
To: jw schultz <jw@pegasys.ws>, linux-kernel@vger.kernel.org
Subject: Re: Transparent compression in the FS
Date: Sun, 26 Oct 2003 18:22:31 -0800 [thread overview]
Message-ID: <20031027022231.GD4511@matchmail.com> (raw)
In-Reply-To: <20031016230448.GA29279@pegasys.ws>
On Thu, Oct 16, 2003 at 04:04:48PM -0700, jw schultz wrote:
> 2. External collision: The smaller hash size to unit count
> ratio the greater liklihood of false positives. To put
> it in the extreme: if you have 10,000 blocks you are
> almost guaranteed to get false positives on a 16 bit
> hash. This is similar to filling the namespace. It is
> external collision that was killing rsync on iso images.
What about making one bit changes in the block at a random location (the
same bit on both sides), and comparing the hashes again.
This would work for something that is trying to save bandwidth over a
network link, but wouldn't save anything in a Compare-by-hash File System
(why hash again when you need to read the block anyway?).
One thought for rsync would be to have a 4 step process (the first two are
how it operates now according to another sub-thread):
1) Check the block with a weak 16bit hash
2) Check the block with a stronger 32bit hash (if step 1 had a collision)
3) Check the block with SHA1 (if step 2 had a collision)
4) Make a one bit change at a random location in the block, but at the same
place on both ends of the transfer and repeat steps 1 to 3
It's arguable how much of an optimization the 16bit hash is, but it's there
and maybe it's a win, and maybe not. It's also arguable just how much
bandwidth you're going to save with the extra steps 3, and the recursive 4.
Maybe 4 should be just a recheck with SHA1, and the recursive check isn't
needed, but I haven't done any hash benchmarks.
next prev parent reply other threads:[~2003-10-27 2:22 UTC|newest]
Thread overview: 92+ messages / expand[flat|nested] mbox.gz Atom feed top
2003-10-14 20:30 Transparent compression in the FS Josh Litherland
2003-10-15 13:33 ` Erik Mouw
2003-10-15 13:45 ` Josh Litherland
2003-10-15 13:50 ` Nikita Danilov
2003-10-15 14:27 ` Erik Mouw
2003-10-15 14:33 ` Nikita Danilov
2003-10-15 15:54 ` Richard B. Johnson
2003-10-15 16:21 ` Nikita Danilov
2003-10-15 17:19 ` Richard B. Johnson
2003-10-15 17:37 ` Andreas Dilger
2003-10-15 17:48 ` Dave Jones
2003-10-15 18:19 ` Richard B. Johnson
2003-10-15 18:06 ` Hans Reiser
2003-10-17 12:51 ` Edward Shushkin
2003-10-15 16:04 ` Erik Mouw
2003-10-15 17:24 ` Josh Litherland
2003-10-15 18:53 ` Erik Bourget
2003-10-15 19:03 ` Geert Uytterhoeven
2003-10-15 19:14 ` Valdis.Kletnieks
2003-10-15 19:24 ` Geert Uytterhoeven
2003-10-15 18:54 ` root
2003-10-16 2:11 ` Chris Meadors
2003-10-16 3:01 ` Shawn
2003-10-15 14:47 ` Erik Bourget
2003-10-15 15:05 ` Nikita Danilov
2003-10-15 15:06 ` Erik Bourget
2003-10-15 21:36 ` Tomas Szepe
2003-10-16 8:04 ` Ville Herva
2003-10-17 1:32 ` Eric W. Biederman
2003-10-15 15:13 ` Jeff Garzik
2003-10-15 21:00 ` Christopher Li
2003-10-16 16:29 ` Andrea Arcangeli
2003-10-16 16:41 ` P
2003-10-16 17:20 ` Jeff Garzik
2003-10-16 23:12 ` jw schultz
2003-10-17 8:03 ` John Bradford
2003-10-17 14:53 ` Eli Carter
2003-10-17 15:27 ` John Bradford
2003-10-17 16:22 ` Eli Carter
2003-10-17 17:15 ` John Bradford
2003-10-16 17:10 ` Jeff Garzik
2003-10-16 17:41 ` Andrea Arcangeli
2003-10-16 17:29 ` Larry McVoy
2003-10-16 17:49 ` Val Henson
2003-10-16 21:02 ` Jeff Garzik
2003-10-16 21:18 ` Chris Meadors
2003-10-16 21:25 ` Jeff Garzik
2003-10-16 21:33 ` Davide Libenzi
2003-10-17 3:47 ` Mark Mielke
2003-10-17 14:31 ` Jörn Engel
2003-10-16 23:04 ` jw schultz
2003-10-16 23:30 ` Jeff Garzik
2003-10-16 23:58 ` jw schultz
2003-10-16 23:53 ` David Lang
2003-10-17 1:19 ` Jeff Garzik
2003-10-17 0:45 ` Christopher Li
2003-10-17 1:16 ` Jeff Garzik
2003-10-17 1:32 ` jlnance
2003-10-17 1:47 ` Eric Sandall
2003-10-17 8:11 ` John Bradford
2003-10-17 17:53 ` Eric Sandall
2003-10-17 13:07 ` jlnance
2003-10-17 14:16 ` Jeff Garzik
2003-10-17 15:06 ` Valdis.Kletnieks
2003-10-17 1:49 ` Davide Libenzi
2003-10-17 1:59 ` Larry McVoy
2003-10-17 2:19 ` jw schultz
2003-10-17 9:44 ` Pavel Machek
2003-10-17 12:33 ` jlnance
2003-10-17 18:23 ` jw schultz
2003-10-27 2:08 ` Mike Fedyk
2003-10-27 2:15 ` jw schultz
2003-10-27 2:22 ` Mike Fedyk [this message]
2003-10-27 2:45 ` jw schultz
2003-10-16 18:28 ` John Bradford
2003-10-16 18:31 ` Robert Love
2003-10-16 20:18 ` Jeff Garzik
2003-10-16 18:43 ` Muli Ben-Yehuda
2003-10-16 18:56 ` Richard B. Johnson
2003-10-16 19:00 ` Robert Love
2003-10-16 19:27 ` John Bradford
2003-10-16 19:03 ` John Bradford
2003-10-16 19:20 ` Richard B. Johnson
2003-10-17 13:16 ` Ingo Oeser
2003-10-16 23:20 ` jw schultz
2003-10-17 14:47 ` Eli Carter
2003-10-16 8:27 ` tconnors+linuxkernel1066292516
2003-10-17 10:55 ` Ingo Oeser
2003-10-15 16:25 ` David Woodhouse
2003-10-15 16:56 ` Andreas Dilger
2003-10-15 17:44 ` David Woodhouse
[not found] <GTJr.60q.17@gated-at.bofh.it>
[not found] ` <GU2N.6v7.17@gated-at.bofh.it>
[not found] ` <GVBC.Ep.23@gated-at.bofh.it>
[not found] ` <Hjkq.3Al.1@gated-at.bofh.it>
[not found] ` <Hkgx.4Vu.7@gated-at.bofh.it>
[not found] ` <HkA0.5lh.9@gated-at.bofh.it>
[not found] ` <HnxT.3BB.27@gated-at.bofh.it>
2003-10-17 8:15 ` Ihar 'Philips' Filipau
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20031027022231.GD4511@matchmail.com \
--to=mfedyk@matchmail.com \
--cc=jw@pegasys.ws \
--cc=linux-kernel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).