From: Jeff Woods <kazrak+kernel@cesmail.net>
To: Holger Kiehl <Holger.Kiehl@dwd.de>
Cc: Luciano Miguel Ferreira Rocha <luciano@lsd.di.uminho.pt>,
linux-c-programming@vger.kernel.org
Subject: Re: Question about checksums
Date: Thu, 21 Aug 2003 10:28:46 -0700 [thread overview]
Message-ID: <5.2.1.1.0.20030821100717.032dfb30@no.incoming.mail> (raw)
In-Reply-To: <Pine.LNX.4.44.0308211617560.6771-100000@praktifix.dwd.de>
At +0000 04:36 PM 8/21/2003, Holger Kiehl wrote:
>On Thu, 21 Aug 2003, Luciano Miguel Ferreira Rocha wrote:
>>On Thu, Aug 21, 2003 at 12:48:05PM +0000, Holger Kiehl wrote:
[snip]
>>>I think md5sum could do the job but, think it is a bit of an overkill to
>>>generate a 128 Bit checksum for such small input data. Also storing such
>>>huge numbers is a bit of a pain. Would a 32 or 64 Bit checksum
>>>sufficient, or would I be running into problems when these are to short?
>>
>>CRC-32 is normally sufficient. It's designed for data corruption on
>>transmission, though, but it should be OK as long as you don't expect
>>people to try and break your code with equal checksums.
>
>I am not trying to make anything more secure. Will a CRC-32 be sufficient
>to always generate a different sum if a single bit changes within the
>maximum 5120 Bytes?
In general, X bits of storage can take on 2^X distinct values. So CRC-32
can take a maximum of approximately four billion possible values. That's
a number with three commas in US notation; I suppose that's twelve periods
or spaces on your side of the pond. A 128 bit value can store
approximately 64 trillion trillion trillion distinct values. That's a
number with *twelve* commas. And a 5120 byte file has 40960 bits so it can
have roughly 1*10^4096 distinct values. There will always be the
possibility for duplicate values when you take a checksum on arbitrary data
longer then the checksum length.
You have to make a tradeoff of how much risk you're willing to accept for a
duplicate based on how that would affect you. A 32 bit checksum has a
*minimum* of one in 4 billion chance of two files sharing the same
checksum. For most non-security applications, that's ample. The same
possibility exists with 128 bit md5 checksums (or any other hash) but the
larger the checksum the less often you'll get duplicate checksums for
different data (assuming comparable quality hash algorithms).
One way to make such use of checksums fail-safe is to use the checksum as
proof that files are different but not as proof they are the same. When
the checksum matches you don't really know the files are the same unless
their contents are actually the same and once every four billion times you
probably can afford to go check if it's really critical to know for certain.
--
Jeff Woods <kazrak+kernel@cesmail.net>
next prev parent reply other threads:[~2003-08-21 17:28 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2003-08-21 12:48 Question about checksums Holger Kiehl
2003-08-21 13:20 ` Luciano Miguel Ferreira Rocha
2003-08-21 16:36 ` Holger Kiehl
2003-08-21 17:28 ` Jeff Woods [this message]
2003-08-22 20:18 ` Holger Kiehl
2003-08-23 20:31 ` printf(), aligning fields J.
2003-08-24 0:07 ` Glynn Clements
2003-08-24 1:05 ` Stephen Satchell
2003-08-21 18:19 ` Question about checksums Luciano Miguel Ferreira Rocha
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5.2.1.1.0.20030821100717.032dfb30@no.incoming.mail \
--to=kazrak+kernel@cesmail.net \
--cc=Holger.Kiehl@dwd.de \
--cc=linux-c-programming@vger.kernel.org \
--cc=luciano@lsd.di.uminho.pt \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).