On 2014-12-01 13:37, David Sterba wrote: > On Wed, Nov 26, 2014 at 08:58:50AM -0500, Austin S Hemmelgarn wrote: >> On 2014-11-26 08:38, Brendan Hide wrote: >>> On 2014/11/25 18:47, David Sterba wrote: >>>> We could provide an interface for external applications that would make >>>> use of the strong checksums. Eg. external dedup, integrity db. The >>>> benefit here is that the checksum is always up to date, so there's no >>>> need to compute the checksums again. At the obvious cost. >>> >>> I can imagine some use-cases where you might even want more than one >>> algorithm to be used and stored. Not sure if that makes me a madman, >>> though. ;) >>> >> Not crazy at all, I would love to have the ability to store multiple >> different weak but fast hash values. For example, on my laptop, it is >> actually faster to compute crc32c, adler32, and md5 hashes together than >> it is to compute pretty much any 256-bit hash I've tried. > > Well, this is doable :) there's space for 256 bits in general, the order of > checksum bytes in one "checksum word" would be given by fixed order the > algorighms are defined. The code complexity would increase, but not that > much I think. > >> This then brings up the issue of what to do when we try to mount such a >> fs on a system that doesn't support some or all of the hashes used. > > I see two modes: first fail if all not present, or relaxed by a mount > option to accept at least one. > > But let's keep this open, I'm not yet convinced that combining more weak > algos makes sense from the crypto POV. If this should protect against > random bitflips, would one fast-but-weak be comparable to a combination? > Or other expectations. > My only reasoning is that with this set of hashes (crc32c, adler32, and md5), the statistical likely-hood of running into a hash collision with more than one of them at a time is infinitesimally small compared to the likely-hood of any one of them having a collision (or even compared to something ridiculous like the probability of being killed by a meteor strike), and the combination is faster on most systems that I have tried than many 256-bit crypto hashes. It's still a tradeoff though, I also think that the idea mentioned elsewhere in this thread of having separate hashes stored for subsections of the same block is also worth looking at.