From: "Martin K. Petersen" <martin.petersen@oracle.com>
To: John Robinson <john.robinson@anonymous.org.uk>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>,
linux-raid@vger.kernel.org
Subject: Re: RFC: detection of silent corruption via ATA long sector reads
Date: Sun, 04 Jan 2009 02:37:23 -0500 [thread overview]
Message-ID: <yq1eizj5xos.fsf@sermon.lab.mkp.net> (raw)
In-Reply-To: <495F6622.9010103@anonymous.org.uk> (John Robinson's message of "Sat\, 03 Jan 2009 13\:20\:34 +0000")
>>>>> "John" == John Robinson <john.robinson@anonymous.org.uk> writes:
John> Excuse me if I'm being dense - and indeed tell me! - but RAID
John> 4/5/6 already suffer from having to do ready-modify-write for
John> small writes, so is there any chance this could be done at
John> relatively little additional expense for these?
You'd still need to store a checksum somewhere else, incurring
additional seek cost. You could attempt to weasel out of that by adding
the checksum sector after a limited number of blocks and hope that you'd
be able to pull it in or write it out in one sweep.
The downside is that assume we do checksums on - say - 8KB chunks in the
RAID5 case. We only need to store a few handfuls of bytes of checksum
goo per block. But we can't address less than a 512 byte sector. So we
need to either waste the bulk of 1 sector for every 16 to increase the
likelihood of adjacent access. Or we can push the checksum sector
further out to fill it completely. That wastes less space but has a
higher chance of causing an extra seek. Pick your poison.
The reason I'm advocating checksumming on logical (filesystem) blocks is
that the filesystems have a much better idea what's good and what's bad
in a recovery situation. And the filesystems already have an
infrastructure for storing metadata like checksums. The cost of
accessing that metadata is inherent and inevitable.
btrfs had checksums from the get-go. The XFS folks are working hard on
adding them. ext4 is going to checksum metadata, I believe. So this is
stuff that's already in the pipeline.
We also don't want to do checksumming at every layer. That's going to
suck from a performance perspective. It's better to do checksumming
high up in the stack and only do it once. As long as we give the upper
layers the option of re-driving the I/O.
That involves adding a cookie to each bio that gets filled out by DM/MD
on completion. If the filesystem checksum fails we can resubmit the I/O
and pass along the cookie indicating that we want a different copy than
the one the cookie represents.
--
Martin K. Petersen Oracle Linux Engineering
next prev parent reply other threads:[~2009-01-04 7:37 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <fa.8mwKV7y4hm+Q6mvIKtp9QGoJYUU@ifi.uio.no>
[not found] ` <fa.4QcsYZC0gJJwJ0eUOht3hDYaVWs@ifi.uio.no>
2008-12-28 22:40 ` RFC: detection of silent corruption via ATA long sector reads Sitsofe Wheeler
2008-12-30 13:48 ` Mark Lord
2009-01-02 20:26 ` Greg Freemyer
2009-01-02 20:43 ` Sitsofe Wheeler
2009-01-02 21:05 ` Greg Freemyer
2009-01-02 22:04 ` Martin K. Petersen
2009-01-02 22:41 ` Greg Freemyer
2009-01-03 3:01 ` Martin K. Petersen
2009-01-03 13:20 ` John Robinson
2009-01-04 7:37 ` Martin K. Petersen [this message]
2009-01-04 12:31 ` John Robinson
2009-01-04 13:49 ` John Robinson
2009-01-05 2:43 ` Martin K. Petersen
2009-01-05 2:45 ` Martin K. Petersen
2009-01-05 3:24 ` NeilBrown
2008-12-26 21:44 Greg Freemyer
2008-12-26 22:15 ` Robert Hancock
2008-12-28 22:26 ` Mark Lord
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=yq1eizj5xos.fsf@sermon.lab.mkp.net \
--to=martin.petersen@oracle.com \
--cc=john.robinson@anonymous.org.uk \
--cc=linux-raid@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).