From: Chris Dunlop <chris@onthe.net.au>
To: Sage Weil <sage@newdream.net>
Cc: Allen Samuels <Allen.Samuels@sandisk.com>,
Igor Fedotov <ifedotov@mirantis.com>,
ceph-devel <ceph-devel@vger.kernel.org>
Subject: Re: Adding compression/checksum support for bluestore.
Date: Fri, 1 Apr 2016 16:28:38 +1100 [thread overview]
Message-ID: <20160401052838.GA8044@onthe.net.au> (raw)
In-Reply-To: <alpine.DEB.2.11.1604010055050.22014@cpach.fuggernut.com>
On Fri, Apr 01, 2016 at 12:56:48AM -0400, Sage Weil wrote:
> On Fri, 1 Apr 2016, Chris Dunlop wrote:
>> On Wed, Mar 30, 2016 at 10:52:37PM +0000, Allen Samuels wrote:
>>> One thing to also factor in is that if you increase the span of a
>>> checksum, you degrade the quality of the checksum. So if you go with 128K
>>> chunks of data you'll likely want to increase the checksum itself from
>>> something beyond a CRC-32. Maybe somebody out there has a good way of
>>> describing this quanitatively.
>>
>> I would have thought the "quality" of a checksum would be a function of how
>> many bits it is, and how evenly and randomly it's distributed, and unrelated
>> to the amount of data being checksummed.
>>
>> I.e. if you have any amount of data covered by an N-bit evenly randomly
>> distributed checksum, and "something" goes wrong with the data (or the
>> checksum), the chance of the checksum still matching the data is 1 in 2^n.
>
> Say there is some bit error rate per bit. If you double the amount of
> data you're checksumming, then you'll see twice as many errors. That
> means that even though your 32-bit checksum is right 2^32-1 times out of
> 2^32, you're twice as likely to hit that 1 in 2^32 chance of getting a
> correct checksum on wrong data.
It seems to me, if we're talking about a single block of data protected by a
32-bit checksum, it doesn't matter how many errors there are within the
block, the chance of a false checksum match is still only 1 in 2^32.
If we're talking about a stream of checksummed blocks, where the stream is
subject to some BER, then, yes, your chances of getting a false match go up.
But that's still independent of the block size, rather it's a function of
the number of possibly corrupt blocks.
In fact, if you have a stream of data subject to some BER and split into
checksummed blocks, the larger the blocks and thereby the lower the number
of blocks, the lower the chance of a false match.
Chris
next prev parent reply other threads:[~2016-04-01 5:28 UTC|newest]
Thread overview: 65+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-03-30 19:46 Adding compression/checksum support for bluestore Allen Samuels
2016-03-30 20:41 ` Vikas Sinha-SSI
2016-03-30 22:24 ` Sage Weil
2016-03-30 22:35 ` Allen Samuels
2016-03-31 16:31 ` Igor Fedotov
2016-03-30 22:15 ` Sage Weil
2016-03-30 22:22 ` Gregory Farnum
2016-03-30 22:30 ` Sage Weil
2016-03-30 22:43 ` Allen Samuels
2016-03-30 22:32 ` Allen Samuels
2016-03-30 22:52 ` Allen Samuels
2016-03-30 22:57 ` Sage Weil
2016-03-30 23:03 ` Gregory Farnum
2016-03-30 23:08 ` Allen Samuels
2016-03-31 23:02 ` Milosz Tanski
2016-04-01 3:56 ` Chris Dunlop
2016-04-01 4:56 ` Sage Weil
2016-04-01 5:28 ` Chris Dunlop [this message]
2016-04-01 14:58 ` Sage Weil
2016-04-01 19:49 ` Chris Dunlop
2016-04-01 23:08 ` Allen Samuels
2016-04-02 2:23 ` Allen Samuels
2016-04-02 2:51 ` Gregory Farnum
2016-04-02 5:05 ` Chris Dunlop
2016-04-02 5:48 ` Allen Samuels
2016-04-02 6:18 ` Gregory Farnum
2016-04-03 13:27 ` Sage Weil
2016-04-04 15:33 ` Chris Dunlop
2016-04-04 15:51 ` Chris Dunlop
2016-04-04 17:58 ` Allen Samuels
2016-04-04 15:26 ` Chris Dunlop
2016-04-04 17:56 ` Allen Samuels
2016-04-02 5:08 ` Allen Samuels
2016-04-02 4:07 ` Chris Dunlop
2016-04-02 5:38 ` Allen Samuels
2016-04-04 15:00 ` Chris Dunlop
2016-04-04 23:58 ` Allen Samuels
2016-04-05 12:35 ` Sage Weil
2016-04-05 15:10 ` Chris Dunlop
2016-04-06 6:38 ` Chris Dunlop
2016-04-06 15:47 ` Allen Samuels
2016-04-06 17:17 ` Chris Dunlop
2016-04-06 18:06 ` Allen Samuels
2016-04-07 0:43 ` Chris Dunlop
2016-04-07 0:52 ` Allen Samuels
2016-04-07 2:59 ` Chris Dunlop
2016-04-07 9:51 ` Willem Jan Withagen
2016-04-07 12:21 ` Atchley, Scott
2016-04-07 15:01 ` Willem Jan Withagen
2016-04-07 9:51 ` Chris Dunlop
2016-04-08 23:16 ` Allen Samuels
2016-04-05 20:41 ` Allen Samuels
2016-04-05 21:14 ` Sage Weil
2016-04-05 12:57 ` Dan van der Ster
2016-04-05 20:50 ` Allen Samuels
2016-04-06 7:15 ` Dan van der Ster
2016-03-31 16:27 ` Igor Fedotov
2016-03-31 16:32 ` Allen Samuels
2016-03-31 17:18 ` Igor Fedotov
2016-03-31 17:39 ` Piotr.Dalek
2016-03-31 18:44 ` Allen Samuels
2016-03-31 16:58 ` Igor Fedotov
2016-03-31 18:38 ` Allen Samuels
2016-04-04 12:14 ` Igor Fedotov
2016-04-04 14:44 ` Allen Samuels
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160401052838.GA8044@onthe.net.au \
--to=chris@onthe.net.au \
--cc=Allen.Samuels@sandisk.com \
--cc=ceph-devel@vger.kernel.org \
--cc=ifedotov@mirantis.com \
--cc=sage@newdream.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.