linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Phil Turmel <philip@turmel.org>
To: NeilBrown <neilb@suse.de>
Cc: Roman Mamedov <rm@romanrm.ru>,
	Joe Landman <joe.landman@gmail.com>,
	linux-raid@vger.kernel.org
Subject: Re: md road-map: 2011
Date: Wed, 16 Feb 2011 19:48:20 -0500	[thread overview]
Message-ID: <4D5C7054.10000@turmel.org> (raw)
In-Reply-To: <20110217085924.01cdf22c@notabene.brown>

On 02/16/2011 04:59 PM, NeilBrown wrote:
> On Thu, 17 Feb 2011 02:44:02 +0500 Roman Mamedov <rm@romanrm.ru> wrote:
> 
>> On Thu, 17 Feb 2011 08:24:12 +1100
>> NeilBrown <neilb@suse.de> wrote:
>>
>>> "read/write/compare checksum" is not a lot of words so I may well not be
>>> understanding exactly what you mean, but I guess you are suggesting that we
>>> could store (say) a 64bit hash of each 4K block somewhere.
>>> e.g. Use 513 4K blocks to store 512 4K blocks of data with checksums.
>>> When reading a block, read the checksum too and report an error if they
>>> don't match.  When writing the block, calculate and write the checksum too.
>>>
>>> This is already done by the disk drive - I'm not sure what you hope to gain
>>> by doing it in the RAID layer as well.
>>
>> Consider RAID1/RAID10/RAID5/RAID6, where one or more members are returning bad
>> data for some reason (e.g. are failing or have written garbage to disk during
>> a sudden power loss). Having per-block checksums would allow to determine
>> which members have correct data and which do not, and would help the RAID
>> layer recover from that situation in the smartest way possible (with absolutely
>> no loss or corruption of the user data).
>>
> 
> Why do you think that md would be able to reliably write consistent data and
> checksum to a device in a circumstance (power failure) where the hard drive
> is not able to do it itelf?

It wouldn't have to be a power failure.  A kernel panic wouldn't be recoverable,
either.

> i.e. I would need to see a clear threat-model which can cause data corruption
> that the hard drive itself would not be able to reliably report, but that
> checksums provided by md would be able to reliably report.
> Powerfail does not qualify (without sophisticated journalling on the part of
> md).

I agree that the hash itself is insufficient, but I don't think a full journal
is needed either.  If each hash had a timestamp and short sequence number, and
was stored with copies of its siblings' sequence numbers, which data was out of
sync could be worked out.  I admit that quantity of meta-data would be
exhorbitant for 512B sectors, but might be acceptable for 4K blocks.  It does
vary with number of raid devices, though.  I'll have think about ways to
minimize that.

It would work for any situation where data in an MD member device's queue didn't
make it to the platter, and the platter retained the old data.  Of course, if the
number of devices with stale data in one stripe exceeds the failure tolerance
of the array, it still can't be fixed.  The algorithm could *revert* to old data
if the number of devices with new data was within the failure tolerance.  That
might be valuable.

Phil

  reply	other threads:[~2011-02-17  0:48 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-02-16 10:27 md road-map: 2011 NeilBrown
2011-02-16 11:28 ` Giovanni Tessore
2011-02-16 13:40   ` Roberto Spadim
2011-02-16 14:00     ` Robin Hill
2011-02-16 14:09       ` Roberto Spadim
2011-02-16 14:21         ` Roberto Spadim
2011-02-16 21:55           ` NeilBrown
2011-02-17  1:30             ` Roberto Spadim
2011-02-16 14:13 ` Joe Landman
2011-02-16 21:24   ` NeilBrown
2011-02-16 21:44     ` Roman Mamedov
2011-02-16 21:59       ` NeilBrown
2011-02-17  0:48         ` Phil Turmel [this message]
2011-02-16 22:12       ` Joe Landman
2011-02-16 15:42 ` David Brown
2011-02-16 21:35   ` NeilBrown
2011-02-16 22:34     ` David Brown
2011-02-16 23:01       ` NeilBrown
2011-02-17  0:30         ` David Brown
2011-02-17  0:55           ` NeilBrown
2011-02-17  1:04           ` Keld Jørn Simonsen
2011-02-17 10:45             ` David Brown
2011-02-17 10:58               ` Keld Jørn Simonsen
2011-02-17 11:45                 ` Giovanni Tessore
2011-02-17 15:44                   ` Keld Jørn Simonsen
2011-02-17 16:22                     ` Roberto Spadim
2011-02-18  0:13                     ` Giovanni Tessore
2011-02-18  2:56                       ` Keld Jørn Simonsen
2011-02-18  4:27                         ` Roberto Spadim
2011-02-18  9:47                         ` Giovanni Tessore
2011-02-18 18:43                           ` Keld Jørn Simonsen
2011-02-18 19:00                             ` Roberto Spadim
2011-02-18 19:18                               ` Keld Jørn Simonsen
2011-02-18 19:22                                 ` Roberto Spadim
2011-02-16 17:20 ` Joe Landman
2011-02-16 21:36   ` NeilBrown
2011-02-16 19:37 ` Phil Turmel
2011-02-16 21:44   ` NeilBrown
2011-02-17  0:11     ` Phil Turmel
2011-02-16 20:29 ` Piergiorgio Sartor
2011-02-16 21:48   ` NeilBrown
2011-02-16 22:53     ` Piergiorgio Sartor
2011-02-17  0:24     ` Phil Turmel
2011-02-17  0:52       ` NeilBrown
2011-02-17  1:14         ` Phil Turmel
2011-02-17  3:10           ` NeilBrown
2011-02-17 18:46             ` Phil Turmel
2011-02-17 21:04             ` Mr. James W. Laferriere
2011-02-18  1:48               ` NeilBrown
2011-02-17 19:56           ` Piergiorgio Sartor
2011-02-16 22:50 ` Keld Jørn Simonsen
2011-02-23  5:06 ` Daniel Reurich

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4D5C7054.10000@turmel.org \
    --to=philip@turmel.org \
    --cc=joe.landman@gmail.com \
    --cc=linux-raid@vger.kernel.org \
    --cc=neilb@suse.de \
    --cc=rm@romanrm.ru \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).