From: Adam Borowski <kilobyte@angband.pl>
To: "Austin S. Hemmelgarn" <ahferroin7@gmail.com>
Cc: "Paul Jones" <paul@pauljones.id.au>,
"Peter Becker" <floyd.net@gmail.com>,
"Holger Hoffstätte" <holger@applied-asynchrony.com>,
"Linux BTRFS Mailinglist" <linux-btrfs@vger.kernel.org>
Subject: Re: [PATCH v2 0/4] Support xxhash64 checksums
Date: Tue, 27 Aug 2019 02:33:18 +0200 [thread overview]
Message-ID: <20190827003318.GA25412@angband.pl> (raw)
In-Reply-To: <69ac4340-c782-aa92-246c-3106b1611eea@gmail.com>
On Mon, Aug 26, 2019 at 08:27:15AM -0400, Austin S. Hemmelgarn wrote:
> On 2019-08-23 13:08, Adam Borowski wrote:
> > the improved collision
> > resistance of xxhash64 is not a reason as if you intend to dedupe you want
> > a crypto hash so you don't need to verify.
>
> The improved collision resistance is a roughly 10 orders of magnitude
> reduction in the chance of a collision. That may not matter for most, but
> it's a significant improvement for anybody operating at large enough scale
> that media errors are commonplace.
Hash size doesn't matter vs media errors. You don't have billions of
mismatches: the first one is a cause of alarm, so 1-in-4294967296 chance of
failing to notice it hardly ever matters (even though it _can_ happen in
real life as opposed to collisions below).
I can think of a bigger hash useful in three cases:
* recovering from a split-brain RAID
* recovering from one disk of a RAID having had a large piece scribbled upon
* finding candidates for deduplication (but see below why not 64-bit)
> Also, you would still need to verify even if you're using whatever the
> fanciest new collision resistant cryptographic hash is, because the number
> of possible input values is still more than _nine thousand_ orders of
> magnitude larger than the total number of output values even if we use a
> 512-bit cryptographic hash.
You're underestimating how rare crypto-strength hash collisions are.
There are two scenarios: unintentional, and malicious.
Let's go with unintentional first: the age of the Universe is 2^58.5
seconds. The fastest disk (non-pmem) is NVMe-connected Optane, at 240000
IOPS. That's 2^17.8. With a 256-bit hash, the mass of machines needed for
a single expected collision within the age of Universe exceeds the mass of
observable Universe itself.
So, malicious. We demand a non-broken hash, which in crypto speak means
there's no known attack better than brute force. An iterative approach is
right out; the best space-time tradeoff is birthday attack, which requires
storage size akin to the root of # of combinations (ie, half the hash
length). It's drastically better: at current best storage densities, you'd
need only the mass of the Earth.
Please let me know when you'll build that Earth-sized computer, so I can
migrate from weak SHA256 to eg. BLAKE2b.
On the other hand, computers and memories get hit by cosmic rays, thermal
noise, and so on at a non-negligible rate. Any theoretical chance of a hash
collision is dwarfed by flaws of technology we have. Or, eg, by the chance
that you'll get hit by multiple lightings the next time you leave your
house.
Thus: no, you don't need to recheck after SHA256.
Meow!
--
⢀⣴⠾⠻⢶⣦⠀
⣾⠁⢠⠒⠀⣿⡁
⢿⡄⠘⠷⠚⠋ The root of a real enemy is an imaginary friend.
⠈⠳⣄⠀⠀⠀⠀
prev parent reply other threads:[~2019-08-27 0:33 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-08-22 11:40 [PATCH v2 0/4] Support xxhash64 checksums Johannes Thumshirn
2019-08-22 11:40 ` [PATCH v2 1/4] btrfs: turn checksum type define into a enum Johannes Thumshirn
2019-08-22 11:40 ` [PATCH v2 2/4] btrfs: create structure to encode checksum type and length Johannes Thumshirn
2019-08-22 12:11 ` Johannes Thumshirn
2019-08-22 13:22 ` [PATCH v2.1] " Johannes Thumshirn
2019-08-22 11:40 ` [PATCH v2 3/4] btrfs: use xxhash64 for checksumming Johannes Thumshirn
2019-08-22 11:40 ` [PATCH v2 4/4] btrfs: sysfs: export supported checksums Johannes Thumshirn
2019-08-22 12:28 ` [PATCH v2 0/4] Support xxhash64 checksums Holger Hoffstätte
2019-08-22 12:54 ` Johannes Thumshirn
2019-08-22 15:40 ` Peter Becker
2019-08-23 9:38 ` Paul Jones
2019-08-23 9:43 ` Paul Jones
2019-08-23 17:08 ` Adam Borowski
2019-08-26 12:27 ` Austin S. Hemmelgarn
2019-08-27 0:33 ` Adam Borowski [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190827003318.GA25412@angband.pl \
--to=kilobyte@angband.pl \
--cc=ahferroin7@gmail.com \
--cc=floyd.net@gmail.com \
--cc=holger@applied-asynchrony.com \
--cc=linux-btrfs@vger.kernel.org \
--cc=paul@pauljones.id.au \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox