linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Martin K. Petersen" <martin.petersen@oracle.com>
To: "H. Peter Anvin" <hpa@zytor.com>
Cc: Neil Brown <neilb@suse.de>, Eyal Lebedinsky <eyal@eyal.emu.id.au>,
	Christian Pernegger <pernegger@gmail.com>,
	linux-raid@vger.kernel.org
Subject: Re: mismatch_cnt questions
Date: Thu, 08 Mar 2007 08:54:10 -0500	[thread overview]
Message-ID: <yq1tzwvj0xp.fsf@sermon.lab.mkp.net> (raw)
In-Reply-To: <45EFAFB8.3070703@zytor.com> (H. Peter Anvin's message of "Wed, 07 Mar 2007 22:39:52 -0800")

>>>>> "hpa" == H Peter Anvin <hpa@zytor.com> writes:

>> What we really want in drives that store 520 byte sectors so that a
>> checksum can be passed all the way up and down through the stack
>> .... or something like that.
>> 

hpa> A lot of SCSI disks have that option, but I believe it's not
hpa> arbitrary bytes.  In particular, the integrity check portion is
hpa> only 2 bytes, 16 bits.

It's important to distinguish between drives that support 520 byte
sectors and drives that include the Data Integrity Feature which also
uses 520 byte sectors.

Most regular SCSI drives can be formatted with 520 byte sectors and a
lot of disk arrays use the extra space to store an internal checksum.
The downside to 520 byte sectors is that it makes buffer management a
pain as 512 bytes of data is followed by 8 bytes of protection data.
That sucks when writing - say - a 4KB block because your scatterlist
becomes long and twisted having to interleave data and protection
data every sector.

The data integrity feature also uses 520 byte byte sectors.  The
difference is that the format of the 8 bytes is well defined.  And
that both initiator and target are capable of verifying the integrity
of an I/O.  It is correct that the CRC is only 16 bits.

DIF is strictly between HBA and disk.  I'm lobbying HBA vendors to
expose it to the OS so we can use it.  I'm also lobbying to get them
to allow us to submit the data and the protection data in separate
scatterlists so we don't have to do the interleaving at the OS level.


hpa> One option, of course, would be to store, say, 16
hpa> sectors/pages/blocks in 17 physical sectors/pages/blocks, where
hpa> the last one is a packing of some sort of high-powered integrity
hpa> checks, e.g. SHA-256, or even an ECC block.  This would hurt
hpa> performance substantially, but it would be highly useful for very
hpa> high data integrity applications.

A while ago I tinkered with something like that.  I actually cheated
and stored the checking data in a different partition on the same
drive.  It was a pretty simple test using my DIF code (i.e. 8 bytes
per sector).

I wanted to see how badly the extra seeks would affect us.  The
results weren't too discouraging but I decided I liked the ZFS
approach better (having the checksum in the fs parent block which
you'll be reading anyway).

-- 
Martin K. Petersen	Oracle Linux Engineering


  reply	other threads:[~2007-03-08 13:54 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-03-04 11:22 mismatch_cnt questions Christian Pernegger
2007-03-04 11:50 ` Neil Brown
2007-03-04 12:01   ` Christian Pernegger
2007-03-04 22:19     ` Neil Brown
2007-03-06 10:04       ` mismatch_cnt questions - how about raid10? Peter Rabbitson
2007-03-06 10:20         ` Neil Brown
2007-03-06 10:56           ` Peter Rabbitson
2007-03-06 10:59             ` Justin Piszcz
2007-03-12  5:35             ` Neil Brown
2007-03-12 14:26               ` Peter Rabbitson
2007-03-04 21:21   ` mismatch_cnt questions Eyal Lebedinsky
2007-03-04 22:30     ` Neil Brown
2007-03-05  7:45       ` Eyal Lebedinsky
2007-03-05 14:56         ` detecting/correcting _slightly_ flaky disks Michael Stumpf
2007-03-05 15:09           ` Justin Piszcz
2007-03-05 17:01             ` Michael Stumpf
2007-03-05 17:11               ` Justin Piszcz
2007-03-07  0:14               ` Bill Davidsen
2007-03-07  1:37                 ` Michael Stumpf
2007-03-07 13:57                   ` berk walker
2007-03-07 15:01                   ` Bill Davidsen
2007-03-05 23:40         ` mismatch_cnt questions Neil Brown
2007-03-07  0:22           ` Bill Davidsen
2007-03-08  6:39           ` H. Peter Anvin
2007-03-08 13:54             ` Martin K. Petersen [this message]
2007-03-09  2:00               ` Bill Davidsen
2007-03-09  4:20                 ` H. Peter Anvin
2007-03-09  5:20                   ` Bill Davidsen
2007-03-08  6:34         ` H. Peter Anvin
2007-03-08  7:00           ` H. Peter Anvin
2007-03-08  8:21             ` H. Peter Anvin
2007-03-13  9:58               ` Andre Noll
2007-03-13 23:46                 ` H. Peter Anvin
2007-03-06  6:27       ` Paul Davidson
2008-05-12 11:16   ` Bas van Schaik
2008-05-12 14:31     ` Justin Piszcz

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=yq1tzwvj0xp.fsf@sermon.lab.mkp.net \
    --to=martin.petersen@oracle.com \
    --cc=eyal@eyal.emu.id.au \
    --cc=hpa@zytor.com \
    --cc=linux-raid@vger.kernel.org \
    --cc=neilb@suse.de \
    --cc=pernegger@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).