linux-c-programming.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jeff Woods <kazrak+kernel@cesmail.net>
To: Holger Kiehl <Holger.Kiehl@dwd.de>
Cc: Luciano Miguel Ferreira Rocha <luciano@lsd.di.uminho.pt>,
	linux-c-programming@vger.kernel.org
Subject: Re: Question about checksums
Date: Thu, 21 Aug 2003 10:28:46 -0700	[thread overview]
Message-ID: <5.2.1.1.0.20030821100717.032dfb30@no.incoming.mail> (raw)
In-Reply-To: <Pine.LNX.4.44.0308211617560.6771-100000@praktifix.dwd.de>

At +0000 04:36 PM 8/21/2003, Holger Kiehl wrote:
>On Thu, 21 Aug 2003, Luciano Miguel Ferreira Rocha wrote:
>>On Thu, Aug 21, 2003 at 12:48:05PM +0000, Holger Kiehl wrote:
[snip]
>>>I think md5sum could do the job but, think it is a bit of an overkill to 
>>>generate a 128 Bit checksum for such small input data. Also storing such 
>>>huge numbers is a bit of a pain. Would a 32 or 64 Bit checksum 
>>>sufficient, or would I be running into problems when these are to short?
>>
>>CRC-32 is normally sufficient. It's designed for data corruption on 
>>transmission, though, but it should be OK as long as you don't expect 
>>people to try and break your code with equal checksums.
>
>I am not trying to make anything more secure. Will a CRC-32 be sufficient 
>to always generate a different sum if a single bit changes within the 
>maximum 5120 Bytes?

In general, X bits of storage can take on 2^X distinct values.  So CRC-32 
can take a maximum of  approximately four billion possible values.  That's 
a number with three commas in US notation; I suppose that's twelve periods 
or spaces on your side of the pond.  A 128 bit value can store 
approximately 64 trillion trillion trillion distinct values. That's a 
number with *twelve* commas.  And a 5120 byte file has 40960 bits so it can 
have roughly 1*10^4096 distinct values.  There will always be the 
possibility for duplicate values when you take a checksum on arbitrary data 
longer then the checksum length.

You have to make a tradeoff of how much risk you're willing to accept for a 
duplicate based on how that would affect you.  A 32 bit checksum has a 
*minimum* of one in 4 billion chance of two files sharing the same 
checksum.  For most non-security applications, that's ample.  The same 
possibility exists with 128 bit md5 checksums (or any other hash) but the 
larger the checksum the less often you'll get duplicate checksums for 
different data (assuming comparable quality hash algorithms).

One way to make such use of checksums fail-safe is to use the checksum as 
proof that files are different but not as proof they are the same.  When 
the checksum matches you don't really know the files are the same unless 
their contents are actually the same and once every four billion times you 
probably can afford to go check if it's really critical to know for certain.

--
Jeff Woods <kazrak+kernel@cesmail.net> 



  reply	other threads:[~2003-08-21 17:28 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2003-08-21 12:48 Question about checksums Holger Kiehl
2003-08-21 13:20 ` Luciano Miguel Ferreira Rocha
2003-08-21 16:36   ` Holger Kiehl
2003-08-21 17:28     ` Jeff Woods [this message]
2003-08-22 20:18       ` Holger Kiehl
2003-08-23 20:31         ` printf(), aligning fields J.
2003-08-24  0:07           ` Glynn Clements
2003-08-24  1:05             ` Stephen Satchell
2003-08-21 18:19     ` Question about checksums Luciano Miguel Ferreira Rocha

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5.2.1.1.0.20030821100717.032dfb30@no.incoming.mail \
    --to=kazrak+kernel@cesmail.net \
    --cc=Holger.Kiehl@dwd.de \
    --cc=linux-c-programming@vger.kernel.org \
    --cc=luciano@lsd.di.uminho.pt \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).