From: Jeff Garzik <jgarzik@pobox.com>
To: Val Henson <val@nmt.edu>
Cc: Larry McVoy <lm@work.bitmover.com>, linux-kernel@vger.kernel.org
Subject: Re: Transparent compression in the FS
Date: Thu, 16 Oct 2003 17:02:30 -0400 [thread overview]
Message-ID: <3F8F0766.30405@pobox.com> (raw)
In-Reply-To: <20031016174927.GB25836@speare5-1-14>
Val Henson wrote:
> Abstract:
>
> "Recent research has produced a new and perhaps dangerous technique
> for uniquely identifying blocks that I will call
> compare-by-hash. Using this technique, we decide whether two blocks
> are identical to each other by comparing their hash values, using a
> collision-resistant hash such as SHA-1. If the hash values match,
> we assume the blocks are identical without further ado. Users of
> compare-by-hash argue that this assumption is warranted because the
> chance of a hash collision between any two randomly generated blocks
> is estimated to be many orders of magnitude smaller than the chance
> of many kinds of hardware errors. Further analysis shows that this
> approach is not as risk-free as it seems at first glance."
I'm curious if anyone has done any work on using multiple different
checksums? For example, the cost of checksumming a single block with
multiple algorithms (sha1+md5+crc32 for a crazy example), and storing
each checksum (instead of just one sha1 sum), may be faster than reading
the block off of disk to compare it with the incoming block. OTOH,
there is still a mathematical possibility (however-more-remote) of a
collission...
With these sorts of schemes, from basic compare-by-hash to one that
actually checks the data for a collission, you take a hit no matter what
when writing, but reading is still full-speed-ahead. (if anyone is
curious, Plan9's venti uses a multi-GB "write buffer", to mitigate the
effect of the checksumming on throughput)
So it's easy to tweak the writing algorithms pretty much any which way,
as long as the outcome is a unique tag for each block. Hash collisions
between two files, for example, could be resolved easily by making each
unique tag "$sha1_hash $n", where $n is the unique number
differentiating two blocks with the same SHA1 hash.
Jeff
next prev parent reply other threads:[~2003-10-16 21:02 UTC|newest]
Thread overview: 92+ messages / expand[flat|nested] mbox.gz Atom feed top
2003-10-14 20:30 Transparent compression in the FS Josh Litherland
2003-10-15 13:33 ` Erik Mouw
2003-10-15 13:45 ` Josh Litherland
2003-10-15 13:50 ` Nikita Danilov
2003-10-15 14:27 ` Erik Mouw
2003-10-15 14:33 ` Nikita Danilov
2003-10-15 15:54 ` Richard B. Johnson
2003-10-15 16:21 ` Nikita Danilov
2003-10-15 17:19 ` Richard B. Johnson
2003-10-15 17:37 ` Andreas Dilger
2003-10-15 17:48 ` Dave Jones
2003-10-15 18:19 ` Richard B. Johnson
2003-10-15 18:06 ` Hans Reiser
2003-10-17 12:51 ` Edward Shushkin
2003-10-15 16:04 ` Erik Mouw
2003-10-15 17:24 ` Josh Litherland
2003-10-15 18:53 ` Erik Bourget
2003-10-15 19:03 ` Geert Uytterhoeven
2003-10-15 19:14 ` Valdis.Kletnieks
2003-10-15 19:24 ` Geert Uytterhoeven
2003-10-15 18:54 ` root
2003-10-16 2:11 ` Chris Meadors
2003-10-16 3:01 ` Shawn
2003-10-15 14:47 ` Erik Bourget
2003-10-15 15:05 ` Nikita Danilov
2003-10-15 15:06 ` Erik Bourget
2003-10-15 21:36 ` Tomas Szepe
2003-10-16 8:04 ` Ville Herva
2003-10-17 1:32 ` Eric W. Biederman
2003-10-15 15:13 ` Jeff Garzik
2003-10-15 21:00 ` Christopher Li
2003-10-16 16:29 ` Andrea Arcangeli
2003-10-16 16:41 ` P
2003-10-16 17:20 ` Jeff Garzik
2003-10-16 23:12 ` jw schultz
2003-10-17 8:03 ` John Bradford
2003-10-17 14:53 ` Eli Carter
2003-10-17 15:27 ` John Bradford
2003-10-17 16:22 ` Eli Carter
2003-10-17 17:15 ` John Bradford
2003-10-16 17:10 ` Jeff Garzik
2003-10-16 17:41 ` Andrea Arcangeli
2003-10-16 17:29 ` Larry McVoy
2003-10-16 17:49 ` Val Henson
2003-10-16 21:02 ` Jeff Garzik [this message]
2003-10-16 21:18 ` Chris Meadors
2003-10-16 21:25 ` Jeff Garzik
2003-10-16 21:33 ` Davide Libenzi
2003-10-17 3:47 ` Mark Mielke
2003-10-17 14:31 ` Jörn Engel
2003-10-16 23:04 ` jw schultz
2003-10-16 23:30 ` Jeff Garzik
2003-10-16 23:58 ` jw schultz
2003-10-16 23:53 ` David Lang
2003-10-17 1:19 ` Jeff Garzik
2003-10-17 0:45 ` Christopher Li
2003-10-17 1:16 ` Jeff Garzik
2003-10-17 1:32 ` jlnance
2003-10-17 1:47 ` Eric Sandall
2003-10-17 8:11 ` John Bradford
2003-10-17 17:53 ` Eric Sandall
2003-10-17 13:07 ` jlnance
2003-10-17 14:16 ` Jeff Garzik
2003-10-17 15:06 ` Valdis.Kletnieks
2003-10-17 1:49 ` Davide Libenzi
2003-10-17 1:59 ` Larry McVoy
2003-10-17 2:19 ` jw schultz
2003-10-17 9:44 ` Pavel Machek
2003-10-17 12:33 ` jlnance
2003-10-17 18:23 ` jw schultz
2003-10-27 2:08 ` Mike Fedyk
2003-10-27 2:15 ` jw schultz
2003-10-27 2:22 ` Mike Fedyk
2003-10-27 2:45 ` jw schultz
2003-10-16 18:28 ` John Bradford
2003-10-16 18:31 ` Robert Love
2003-10-16 20:18 ` Jeff Garzik
2003-10-16 18:43 ` Muli Ben-Yehuda
2003-10-16 18:56 ` Richard B. Johnson
2003-10-16 19:00 ` Robert Love
2003-10-16 19:27 ` John Bradford
2003-10-16 19:03 ` John Bradford
2003-10-16 19:20 ` Richard B. Johnson
2003-10-17 13:16 ` Ingo Oeser
2003-10-16 23:20 ` jw schultz
2003-10-17 14:47 ` Eli Carter
2003-10-16 8:27 ` tconnors+linuxkernel1066292516
2003-10-17 10:55 ` Ingo Oeser
2003-10-15 16:25 ` David Woodhouse
2003-10-15 16:56 ` Andreas Dilger
2003-10-15 17:44 ` David Woodhouse
[not found] <GTJr.60q.17@gated-at.bofh.it>
[not found] ` <GU2N.6v7.17@gated-at.bofh.it>
[not found] ` <GVBC.Ep.23@gated-at.bofh.it>
[not found] ` <Hjkq.3Al.1@gated-at.bofh.it>
[not found] ` <Hkgx.4Vu.7@gated-at.bofh.it>
[not found] ` <HkA0.5lh.9@gated-at.bofh.it>
[not found] ` <HnxT.3BB.27@gated-at.bofh.it>
2003-10-17 8:15 ` Ihar 'Philips' Filipau
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=3F8F0766.30405@pobox.com \
--to=jgarzik@pobox.com \
--cc=linux-kernel@vger.kernel.org \
--cc=lm@work.bitmover.com \
--cc=val@nmt.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).