From: Bill Davidsen <davidsen@tmr.com>
To: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>, Neil Brown <neilb@suse.de>,
Eyal Lebedinsky <eyal@eyal.emu.id.au>,
Christian Pernegger <pernegger@gmail.com>,
linux-raid@vger.kernel.org
Subject: Re: mismatch_cnt questions
Date: Thu, 08 Mar 2007 21:00:26 -0500 [thread overview]
Message-ID: <45F0BFBA.5010201@tmr.com> (raw)
In-Reply-To: <yq1tzwvj0xp.fsf@sermon.lab.mkp.net>
Martin K. Petersen wrote:
>>>>>> "hpa" == H Peter Anvin <hpa@zytor.com> writes:
>>>>>>
>
>
>>> What we really want in drives that store 520 byte sectors so that a
>>> checksum can be passed all the way up and down through the stack
>>> .... or something like that.
>>>
>>>
>
> hpa> A lot of SCSI disks have that option, but I believe it's not
> hpa> arbitrary bytes. In particular, the integrity check portion is
> hpa> only 2 bytes, 16 bits.
>
> It's important to distinguish between drives that support 520 byte
> sectors and drives that include the Data Integrity Feature which also
> uses 520 byte sectors.
>
> Most regular SCSI drives can be formatted with 520 byte sectors and a
> lot of disk arrays use the extra space to store an internal checksum.
> The downside to 520 byte sectors is that it makes buffer management a
> pain as 512 bytes of data is followed by 8 bytes of protection data.
> That sucks when writing - say - a 4KB block because your scatterlist
> becomes long and twisted having to interleave data and protection
> data every sector.
>
> The data integrity feature also uses 520 byte byte sectors. The
> difference is that the format of the 8 bytes is well defined. And
> that both initiator and target are capable of verifying the integrity
> of an I/O. It is correct that the CRC is only 16 bits.
>
When last I looked at Hamming code, and that would be 1989 or 1990, I
believe that I learned that the number of Hamming bits needed to cover N
data bits was 1+log2(N), which for 512 bytes would be 1+12, and fit into
a 16 bit field nicely. I don't know that I would go that way, fix any
one bit error, detect any two bit error, rather than a CRC which gives
me only one chance in 64k of an undetected data error, but I find it
interesting.
I also looked at fire codes, which at the time would still be a viable
topic for a thesis. I remember nothing about how they worked whatsoever.
> DIF is strictly between HBA and disk. I'm lobbying HBA vendors to
> expose it to the OS so we can use it. I'm also lobbying to get them
> to allow us to submit the data and the protection data in separate
> scatterlists so we don't have to do the interleaving at the OS level.
>
>
> hpa> One option, of course, would be to store, say, 16
> hpa> sectors/pages/blocks in 17 physical sectors/pages/blocks, where
> hpa> the last one is a packing of some sort of high-powered integrity
> hpa> checks, e.g. SHA-256, or even an ECC block. This would hurt
> hpa> performance substantially, but it would be highly useful for very
> hpa> high data integrity applications.
>
> A while ago I tinkered with something like that. I actually cheated
> and stored the checking data in a different partition on the same
> drive. It was a pretty simple test using my DIF code (i.e. 8 bytes
> per sector).
>
> I wanted to see how badly the extra seeks would affect us. The
> results weren't too discouraging but I decided I liked the ZFS
> approach better (having the checksum in the fs parent block which
> you'll be reading anyway).
>
>
--
bill davidsen <davidsen@tmr.com>
CTO TMR Associates, Inc
Doing interesting things with small computers since 1979
next prev parent reply other threads:[~2007-03-09 2:00 UTC|newest]
Thread overview: 36+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-03-04 11:22 mismatch_cnt questions Christian Pernegger
2007-03-04 11:50 ` Neil Brown
2007-03-04 12:01 ` Christian Pernegger
2007-03-04 22:19 ` Neil Brown
2007-03-06 10:04 ` mismatch_cnt questions - how about raid10? Peter Rabbitson
2007-03-06 10:20 ` Neil Brown
2007-03-06 10:56 ` Peter Rabbitson
2007-03-06 10:59 ` Justin Piszcz
2007-03-12 5:35 ` Neil Brown
2007-03-12 14:26 ` Peter Rabbitson
2007-03-04 21:21 ` mismatch_cnt questions Eyal Lebedinsky
2007-03-04 22:30 ` Neil Brown
2007-03-05 7:45 ` Eyal Lebedinsky
2007-03-05 14:56 ` detecting/correcting _slightly_ flaky disks Michael Stumpf
2007-03-05 15:09 ` Justin Piszcz
2007-03-05 17:01 ` Michael Stumpf
2007-03-05 17:11 ` Justin Piszcz
2007-03-07 0:14 ` Bill Davidsen
2007-03-07 1:37 ` Michael Stumpf
2007-03-07 13:57 ` berk walker
2007-03-07 15:01 ` Bill Davidsen
2007-03-05 23:40 ` mismatch_cnt questions Neil Brown
2007-03-07 0:22 ` Bill Davidsen
2007-03-08 6:39 ` H. Peter Anvin
2007-03-08 13:54 ` Martin K. Petersen
2007-03-09 2:00 ` Bill Davidsen [this message]
2007-03-09 4:20 ` H. Peter Anvin
2007-03-09 5:20 ` Bill Davidsen
2007-03-08 6:34 ` H. Peter Anvin
2007-03-08 7:00 ` H. Peter Anvin
2007-03-08 8:21 ` H. Peter Anvin
2007-03-13 9:58 ` Andre Noll
2007-03-13 23:46 ` H. Peter Anvin
2007-03-06 6:27 ` Paul Davidson
2008-05-12 11:16 ` Bas van Schaik
2008-05-12 14:31 ` Justin Piszcz
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=45F0BFBA.5010201@tmr.com \
--to=davidsen@tmr.com \
--cc=eyal@eyal.emu.id.au \
--cc=hpa@zytor.com \
--cc=linux-raid@vger.kernel.org \
--cc=martin.petersen@oracle.com \
--cc=neilb@suse.de \
--cc=pernegger@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).