All of lore.kernel.org
 help / color / mirror / Atom feed
From: Tkil <tkil@scrye.com>
To: omb@bluewin.ch
Cc: David Lang <david.lang@digitalinsight.com>,
	Ingo Molnar <mingo@elte.hu>,
	git@vger.kernel.org
Subject: Re: SHA1 hash safety
Date: Sat, 16 Apr 2005 21:23:37 -0600	[thread overview]
Message-ID: <ghdi684sm.fsf@brand.scrye.com> (raw)
In-Reply-To: <4261132A.3090907@khandalf.com> (Brian O'Mahoney's message of "Sat, 16 Apr 2005 15:29:14 +0200")

>>>>> "Brian" == Brian O'Mahoney <omb@khandalf.com> writes:

Brian> (1) I _have_ seen real-life collisions with MD5, in the context
Brian>     of Document management systems containing ~10^6 ms-WORD
Brian>     documents.

Was this whole-document based, or was it blocked or otherwise chunked?

I'm wondering, because (SFAIK) the MS word on-disk format is some
serialized version of one or more containers, possibly nested.  If
you're blocks are sized so that the first block is the same across
multiple files, this could cause collisions -- but they're the good
kind, that allow us to save disk space, so they're not a problem.

Are you saying that, within 1e7 documents, that you found two
documents with the same MD5 hash yet different contents?

That's not an accusation, btw; I'm just trying to get clarity on the
terminology.  I'm fascinated by the idea of using this sort of
content-addressable filesystem, but the chance of any collision at all
wigs me out.  I look at the probabilities, but still.

Thanks,
t.

  parent reply	other threads:[~2005-04-17  3:19 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-04-16 12:24 SHA1 hash safety David Lang
2005-04-16 12:31 ` Ingo Molnar
2005-04-16 12:48   ` David Lang
2005-04-16 13:29     ` Brian O'Mahoney
2005-04-16 14:58       ` C. Scott Ananian
2005-04-16 15:11         ` Petr Baudis
2005-04-16 15:36           ` C. Scott Ananian
2005-04-16 22:56             ` David Lang
2005-04-16 23:11               ` Paul Jackson
2005-04-16 23:18                 ` Martin Mares
2005-04-17  4:38                 ` David A. Wheeler
2005-04-18  0:09                   ` Theodore Ts'o
2005-04-16 15:49         ` ross
2005-04-17  6:35           ` Horst von Brand
2005-04-18  2:07             ` Brian O'Mahoney
2005-04-18 16:50             ` C. Scott Ananian
2005-04-16 19:16         ` Paul Jackson
2005-04-16 21:35         ` Brian O'Mahoney
2005-04-18  7:43           ` Andy Isaacson
2005-04-18 17:04             ` C. Scott Ananian
2005-04-19 22:30             ` David Meybohm
2005-04-19 22:48               ` C. Scott Ananian
2005-04-20 18:56                 ` David Meybohm
2005-04-16 22:46         ` David Lang
2005-04-16 23:14           ` Paul Jackson
2005-04-16 22:33       ` David Lang
2005-04-17  3:23       ` Tkil [this message]
2005-04-17  4:09         ` Paul Jackson
2005-04-17  4:43           ` Tkil
2005-04-17  5:09             ` Paul Jackson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ghdi684sm.fsf@brand.scrye.com \
    --to=tkil@scrye.com \
    --cc=david.lang@digitalinsight.com \
    --cc=git@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=omb@bluewin.ch \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.