All of lore.kernel.org
 help / color / mirror / Atom feed
From: Bart Samwel <bart@samwel.tk>
To: Timothy Miller <miller@techsource.com>, Valdis.Kletnieks@vt.edu
Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: [OT] Redundancy eliminating file systems, breaking MD5, donating money to OSDL
Date: Sat, 17 Jan 2004 14:15:31 +0100	[thread overview]
Message-ID: <200401171415.31645.bart@samwel.tk> (raw)
In-Reply-To: <4008509B.2060707@techsource.com>

On Friday 16 January 2004 21:59, Timothy Miller wrote:
> Valdis.Kletnieks@vt.edu wrote:
> > On Fri, 16 Jan 2004 15:22:39 EST, Timothy Miller <miller@techsource.com>  
said:
> >>Think about it!  If we had a filesystem that actually DID this, and it
> >>was in the Linux kernel, it would spread far and wide.  It's bound to
> >>happen that someone will identify a collision.  We then report that to
> >>the committee offering the reward and then donate it to OSDL to help
> >>Linux development.
> >
> > Actually, it's *not* "bound to happen".  Figure out the number of blocks
> > you'd need to have even a 1% chance of a birthday collision in a 2**128
> > space.
> >
> > And you'd need that many disk blocks on *a single system*.
> >
> > Then figure out the chances of a collision on a small machine that only
> > has 20 or 30 terabytes (yes, in this case terabytes is small).
>
> Certainly.  No one machine is going to find it in a reasonable period.
> OTOH, if a million machines were doing it, it increases the chances by
> just that much.

Let's take a look at the chances. 30 terabytes is, in a best-case scenario 
(with 512-byte blocks) about 6e10 blocks. That would be roughly 
6e10*6e10*(2**(-128)), or about 1e-17. With a hundred million machines, the 
chances of a collision would be about 1e-9, disregarding the fact that all 
these machines have a large chance of containing similar blocks -- their data 
isn't truly random, so some blocks have a larger chance of occurring than 
others. The data sets on the machines are probably reasonably static, so if 
the collision isn't found *at once* the chances of it occurring later are 
much smaller. So, even under the most positive assumptions, with a hundred 
million machines with 30 terabytes of storage each, it's extremely probable 
that you won't find a collision. (A 96-bit hash could have been broken with 
this setup however. :) )

-- Bart

  reply	other threads:[~2004-01-17 13:16 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-01-16 20:22 [OT] Redundancy eliminating file systems, breaking MD5, donating money to OSDL Timothy Miller
2004-01-16 20:37 ` Valdis.Kletnieks
2004-01-16 20:59   ` Timothy Miller
2004-01-17 13:15     ` Bart Samwel [this message]
2004-01-20 19:21       ` Matthias Schniedermeyer
2004-01-21 11:46         ` Bart Samwel
2004-01-22  0:12         ` Pavel Machek
2004-01-22  8:29           ` Matthias Schniedermeyer
2004-01-22  2:36         ` Jamie Lokier
2004-01-22  8:51           ` Matthias Schniedermeyer
  -- strict thread matches above, loose matches on Subject: below --
2004-01-20 18:06 Clayton Weaver

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200401171415.31645.bart@samwel.tk \
    --to=bart@samwel.tk \
    --cc=Valdis.Kletnieks@vt.edu \
    --cc=linux-kernel@vger.kernel.org \
    --cc=miller@techsource.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.