From: David Brown <david.brown@hesbynett.no>
To: Piergiorgio Sartor <piergiorgio.sartor@nexgo.de>
Cc: Andrea Mazzoleni <amadvance@gmail.com>,
linux-raid@vger.kernel.org, linux-btrfs@vger.kernel.org,
hpa@zytor.com, creamyfish@gmail.com
Subject: Re: Triple parity and beyond
Date: Fri, 22 Nov 2013 01:32:09 +0100 [thread overview]
Message-ID: <528EA609.9060702@hesbynett.no> (raw)
In-Reply-To: <20131121205229.GA2458@lazy.lzy>
On 21/11/13 21:52, Piergiorgio Sartor wrote:
> Hi David,
>
> On Thu, Nov 21, 2013 at 09:31:46PM +0100, David Brown wrote:
> [...]
>> If this can all be done to give the user an informed choice, then it
>> sounds good.
>
> that would be my target.
> To _offer_ more options to the (advanced) user.
> It _must_ always be under user control.
>
>> One issue here is whether the check should be done with the filesystem
>> mounted and in use, or only off-line. If it is off-line then it will
>> mean a long down-time while the array is checked - but if it is online,
>> then there is the risk of confusing the filesystem and caches by
>> changing the data.
>
> Currently, "raid6check" can work with FS mounted.
> I got the suggestion from Neil (of course).
> It is possible to lock one stripe and check it.
> This should be, at any given time, consistent
> (that is, the parity should always match the data).
> If an error is found, it is reported.
> Again, the user can decide to fix it or not,
> considering all the FS consequences and so on.
>
If you can lock stripes, and make sure any old data from that stripe is
flushed from the caches (if you change it while locked), then that
sounds ideal.
>> Most disk errors /are/ detectable, and are reported by the underlying
>> hardware - small surface errors are corrected by the disk's own error
>> checking and correcting mechanisms, and larger errors are usually
>> detected. It is (or should be!) very rare that a read error goes
>> undetected without there being a major problem with the disk controller.
>> And if the error is detected, then the normal raid processing kicks in
>> as there is no doubt about which block has problems.
>
> That's clear. That case is an "erasure" (I think)
> and it is perfectly in line with the usual operation.
> I'm not trying to replace this mechanism.
>
>> If you can be /sure/ about which data block is incorrect, then I agree -
>> but you can't be /entirely/ sure. But I agree that you can make a good
>> enough guess to recommend a fix to the user - as long as it is not
>> automatic.
>
> One typical case is when many errors are
> found, belonging to the same disk.
> This case clearly shows the disk is to be
> replaced or the interface checked...
> But, again, the user is the master, not the
> machine... :-)
I don't know what sort of interface you have for the user, but I guess
that means you'll have to collect a number of failures before showing
them so that the user can see the correlation on disk number.
>
>> For most ECC schemes, you know that all your blocks are set
>> synchronously - so any block that does not fit in, is an error. With
>> raid, it could also be that a stripe is only partly written - you can
>
> Could it be?
> I would consider this an error.
It could occur as the result of a failure of some sort (kernel crash,
power failure, temporary disk problem, etc.). More generally, md raid
doesn't have to be on local physical disks - maybe one of the "disks" is
an iSCSI drive or something else over a network that could have failures
or delays. I haven't thought through all cases here - I am just
throwing them out as possibilities that might cause trouble.
> The stripe must always be consistent, there
> should be a transactional mechanism to make
> sure that, if read back, the data is always
> matching the parity.
> When I write "read back" I mean from whatever
> the data is: physical disk or cache.
> Otherwise, the check must run exclusively on
> the array (no mounted FS, no other things
> running on it).
>
>> have two different valid sets of data mixed to give an inconsistent
>> stripe, without any good way of telling what consistent data is the best
>> choice.
>>
>> Perhaps a checking tool can take advantage of a write-intent bitmap (if
>> there is one) so that it knows if an inconsistent stripe is partly
>> updated or the result of a disk error.
>
> Of course, this is an option, which should be
> taken into consideration.
>
> Any improvement idea is welcome!!!
>
> bye,
>
next prev parent reply other threads:[~2013-11-22 0:32 UTC|newest]
Thread overview: 96+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-11-18 22:08 Triple parity and beyond Andrea Mazzoleni
2013-11-18 22:12 ` H. Peter Anvin
2013-11-18 22:35 ` Andrea Mazzoleni
2013-11-18 23:25 ` H. Peter Anvin
2013-11-19 10:16 ` David Brown
2013-11-19 17:36 ` Andrea Mazzoleni
2013-11-19 22:51 ` Drew
2013-11-20 0:54 ` Chris Murphy
2013-11-20 1:23 ` John Williams
2013-11-20 10:35 ` David Brown
2013-11-20 10:31 ` David Brown
2013-11-20 18:09 ` John Williams
2013-11-20 18:44 ` Andrea Mazzoleni
2013-11-21 6:15 ` Stan Hoeppner
2013-11-21 8:32 ` David Brown
2013-11-20 18:34 ` Andrea Mazzoleni
2013-11-20 18:43 ` H. Peter Anvin
2013-11-20 18:56 ` Andrea Mazzoleni
2013-11-20 18:59 ` H. Peter Anvin
2013-11-20 21:21 ` Andrea Mazzoleni
2013-11-20 19:00 ` H. Peter Anvin
2013-11-20 21:04 ` Andrea Mazzoleni
2013-11-20 21:06 ` H. Peter Anvin
2013-11-21 8:36 ` David Brown
2013-11-19 17:28 ` Andrea Mazzoleni
2013-11-19 20:29 ` Ric Wheeler
2013-11-20 16:16 ` James Plank
2013-11-20 19:05 ` Andrea Mazzoleni
2013-11-20 19:10 ` H. Peter Anvin
2013-11-20 20:30 ` James Plank
2013-11-20 21:23 ` Andrea Mazzoleni
2013-11-27 2:50 ` ronnie sahlberg
2013-11-20 21:28 ` H. Peter Anvin
2013-11-21 1:28 ` Stan Hoeppner
2013-11-21 2:46 ` John Williams
2013-11-21 6:52 ` Stan Hoeppner
2013-11-21 7:05 ` John Williams
2013-11-21 22:57 ` Stan Hoeppner
2013-11-21 23:38 ` John Williams
2013-11-22 9:35 ` Stan Hoeppner
2013-11-22 15:01 ` John Williams
2013-11-22 22:28 ` Stan Hoeppner
2013-11-22 23:07 ` NeilBrown
2013-11-23 3:46 ` Stan Hoeppner
2013-11-23 5:04 ` NeilBrown
2013-11-23 5:34 ` John Williams
2013-11-23 7:12 ` NeilBrown
2013-11-24 4:03 ` Stan Hoeppner
2013-11-24 5:14 ` John Williams
2013-11-24 21:13 ` Stan Hoeppner
2013-11-24 23:28 ` Rudy Zijlstra
[not found] ` <l6u3h9$l72$2@ger.gmane.org>
2013-11-25 2:04 ` Stan Hoeppner
2013-11-25 9:15 ` David Brown
2013-11-24 5:19 ` Russell Coker
2013-11-24 21:44 ` Stan Hoeppner
2013-11-24 22:31 ` Mark Knecht
2013-11-25 2:14 ` Russell Coker
2013-11-25 9:20 ` David Brown
2013-11-21 8:08 ` joystick
2013-11-22 0:30 ` Stan Hoeppner
2013-11-22 0:33 ` H. Peter Anvin
2013-11-22 0:45 ` David Brown
2013-11-21 9:07 ` David Brown
2013-11-21 9:54 ` Adam Goryachev
2013-11-21 10:32 ` David Brown
2013-11-22 8:12 ` Russell Coker
2013-11-25 18:23 ` Pasi Kärkkäinen
2013-11-22 8:13 ` Stan Hoeppner
2013-11-22 13:15 ` David Brown
2013-11-22 16:07 ` Stan Hoeppner
2013-11-22 22:59 ` NeilBrown
2013-11-23 17:39 ` David Brown
2013-11-22 16:50 ` Mark Knecht
2013-11-22 19:51 ` Duncan
2013-11-22 8:38 ` Stan Hoeppner
2013-11-22 13:24 ` David Brown
2013-11-28 7:16 ` Stan Hoeppner
2013-11-28 7:36 ` Russell Coker
2013-11-28 9:56 ` David Brown
2013-11-21 19:56 ` Piergiorgio Sartor
2013-11-19 18:12 ` Piergiorgio Sartor
2013-11-20 10:44 ` David Brown
2013-11-20 21:59 ` Piergiorgio Sartor
2013-11-21 10:13 ` David Brown
2013-11-21 17:37 ` Goffredo Baroncelli
2013-11-21 20:05 ` Piergiorgio Sartor
2013-11-21 20:31 ` David Brown
2013-11-21 20:52 ` Piergiorgio Sartor
2013-11-22 0:32 ` David Brown [this message]
2013-11-22 20:32 ` Piergiorgio Sartor
2013-11-26 18:10 ` joystick
2013-11-20 21:38 ` Andrea Mazzoleni
2013-11-20 22:29 ` Piergiorgio Sartor
2013-11-23 7:55 ` Andrea Mazzoleni
2013-11-23 22:10 ` Piergiorgio Sartor
2013-11-24 9:39 ` Andrea Mazzoleni
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=528EA609.9060702@hesbynett.no \
--to=david.brown@hesbynett.no \
--cc=amadvance@gmail.com \
--cc=creamyfish@gmail.com \
--cc=hpa@zytor.com \
--cc=linux-btrfs@vger.kernel.org \
--cc=linux-raid@vger.kernel.org \
--cc=piergiorgio.sartor@nexgo.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).