* Paranoid mode for RAID-1 ?
[not found] <1610581828.22506051.1430116628556.JavaMail.zimbra@laposte.net>
@ 2015-04-27 6:37 ` Jean-Baptiste Thomas
2015-04-27 6:48 ` Adam Goryachev
` (2 more replies)
0 siblings, 3 replies; 15+ messages in thread
From: Jean-Baptiste Thomas @ 2015-04-27 6:37 UTC (permalink / raw)
To: linux-raid
I'm looking for a way to get MD to operate in a mode in which
reading a sector from a RAID-1 device would not succeed until it
got matching data from at least two components.
Recent experience[1] suggests that a transient problem in one
disk can completely hose a four way RAID-1 array, which is
otherwise supposed to still be fine after a triple total
failure. I'm hoping that a paranoid mode would have prevented
that.
If there is such a thing, PLEASE tell me. If not, please tell me
so I don't waste any more time looking for it.
[1] "Massive RAID-1 desync"
http://www.spinics.net/lists/raid/msg48681.html
http://marc.info/?l=linux-raid&m=143003812706563&w=2
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Paranoid mode for RAID-1 ?
2015-04-27 6:37 ` Paranoid mode for RAID-1 ? Jean-Baptiste Thomas
@ 2015-04-27 6:48 ` Adam Goryachev
2015-04-27 7:15 ` David Brown
2015-04-27 6:49 ` NeilBrown
2015-04-27 8:45 ` Pieter De Wit
2 siblings, 1 reply; 15+ messages in thread
From: Adam Goryachev @ 2015-04-27 6:48 UTC (permalink / raw)
To: linux-raid
On 27/04/15 16:37, Jean-Baptiste Thomas wrote:
> I'm looking for a way to get MD to operate in a mode in which
> reading a sector from a RAID-1 device would not succeed until it
> got matching data from at least two components.
>
> Recent experience[1] suggests that a transient problem in one
> disk can completely hose a four way RAID-1 array, which is
> otherwise supposed to still be fine after a triple total
> failure. I'm hoping that a paranoid mode would have prevented
> that.
There isn't any such thing that I am aware of in Linux MD RAID. However,
I've heard that if you want data integrity, then you could use zfs,
which supports RAID as well as data checksums to ensure that the data
read back matches the data you wrote....
Personally, I've never used zfs, and there might be other FS's that will
have the feature as well (eg, btrfs etc).
Hope that helps...
Regards,
Adam
--
Adam Goryachev Website Managers www.websitemanagers.com.au
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Paranoid mode for RAID-1 ?
2015-04-27 6:37 ` Paranoid mode for RAID-1 ? Jean-Baptiste Thomas
2015-04-27 6:48 ` Adam Goryachev
@ 2015-04-27 6:49 ` NeilBrown
2015-04-27 10:52 ` Jean-Baptiste Thomas
2015-04-27 8:45 ` Pieter De Wit
2 siblings, 1 reply; 15+ messages in thread
From: NeilBrown @ 2015-04-27 6:49 UTC (permalink / raw)
To: Jean-Baptiste Thomas; +Cc: linux-raid
[-- Attachment #1: Type: text/plain, Size: 1588 bytes --]
On Mon, 27 Apr 2015 08:37:59 +0200 (CEST) Jean-Baptiste Thomas
<cau2jeaf1honoq@laposte.net> wrote:
> I'm looking for a way to get MD to operate in a mode in which
> reading a sector from a RAID-1 device would not succeed until it
> got matching data from at least two components.
>
> Recent experience[1] suggests that a transient problem in one
> disk can completely hose a four way RAID-1 array, which is
> otherwise supposed to still be fine after a triple total
> failure. I'm hoping that a paranoid mode would have prevented
> that.
>
> If there is such a thing, PLEASE tell me. If not, please tell me
> so I don't waste any more time looking for it.
No, there is no such thing.
There "should" be no circumstance which would make it worth while.
A drive may well report an error, but it should *never* report incorrect data
as though it were correct. That is horribly broken.
The cost of running in a "safe" mode would be high, and the likely benefit
extremely low. So it is unlikely that anyone would use it for long. So
implementing it seems rather pointless.
That said: if someone were to provide an implementation I would certainly
consider reviewing it and adding it to md.
NeilBrown
>
> [1] "Massive RAID-1 desync"
> http://www.spinics.net/lists/raid/msg48681.html
> http://marc.info/?l=linux-raid&m=143003812706563&w=2
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 811 bytes --]
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Paranoid mode for RAID-1 ?
2015-04-27 6:48 ` Adam Goryachev
@ 2015-04-27 7:15 ` David Brown
2015-04-27 7:35 ` Mikael Abrahamsson
0 siblings, 1 reply; 15+ messages in thread
From: David Brown @ 2015-04-27 7:15 UTC (permalink / raw)
To: Adam Goryachev, linux-raid
On 27/04/15 08:48, Adam Goryachev wrote:
> On 27/04/15 16:37, Jean-Baptiste Thomas wrote:
>> I'm looking for a way to get MD to operate in a mode in which
>> reading a sector from a RAID-1 device would not succeed until it
>> got matching data from at least two components.
>>
>> Recent experience[1] suggests that a transient problem in one
>> disk can completely hose a four way RAID-1 array, which is
>> otherwise supposed to still be fine after a triple total
>> failure. I'm hoping that a paranoid mode would have prevented
>> that.
>
> There isn't any such thing that I am aware of in Linux MD RAID. However,
> I've heard that if you want data integrity, then you could use zfs,
> which supports RAID as well as data checksums to ensure that the data
> read back matches the data you wrote....
>
> Personally, I've never used zfs, and there might be other FS's that will
> have the feature as well (eg, btrfs etc).
>
btrfs has data checksums like that. Like Neil, I question the necessity
for harddisks, but such checksums are lower cost than reading the data
twice from two disks (as they are stored as part of the metadata that
you already read), and can offer some protection against serious
hardware problems. (Checksums like this cannot easily be implemented in
a transparent block device such as md raid - it is more practical to
have them as part of the filesystem, as done with btrfs.)
> Hope that helps...
>
> Regards,
> Adam
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Paranoid mode for RAID-1 ?
2015-04-27 7:15 ` David Brown
@ 2015-04-27 7:35 ` Mikael Abrahamsson
2015-04-27 8:18 ` Adam Goryachev
0 siblings, 1 reply; 15+ messages in thread
From: Mikael Abrahamsson @ 2015-04-27 7:35 UTC (permalink / raw)
To: David Brown; +Cc: Adam Goryachev, linux-raid
On Mon, 27 Apr 2015, David Brown wrote:
> btrfs has data checksums like that. Like Neil, I question the necessity
> for harddisks, but such checksums are lower cost than reading the data
> twice from two disks (as they are stored as part of the metadata that
> you already read), and can offer some protection against serious
> hardware problems. (Checksums like this cannot easily be implemented in
> a transparent block device such as md raid - it is more practical to
> have them as part of the filesystem, as done with btrfs.)
Only way I can imagine this being done would be for instance to add a 4KiB
block for every 128KiB chunk or something like that, and perhaps have a
smaller checksum for each 4KiB block within that 128KiB chunk.
I doubt anyone would be interested in putting efforts into creating this
though as it would have "interesting" performance drawbacks, and that work
is probably better spent by making sure that btrfs and/or zfs gets more
development/testing than it is to put that effort into md. I personally
prefer md to be fairly "simple" so we have as few bugs as possible in it,
I'd say that md generally works and the number of developers working
heroically on its current incarnation is barely enough to make sure that
the codebase works as well as it must considering the critical function it
serves for a lot of us.
This has been discussed before and nobody has shown interest in actually
developing code for it, so we're still at the feature request and
"brainstorming about design" state, and without actual coder(s) willing to
actually implement, it's not going to get further than this stage.
--
Mikael Abrahamsson email: swmike@swm.pp.se
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Paranoid mode for RAID-1 ?
2015-04-27 7:35 ` Mikael Abrahamsson
@ 2015-04-27 8:18 ` Adam Goryachev
2015-04-27 8:34 ` Paranoid mode for RAID-1 ? MD-RAID checksums Pasi Kärkkäinen
2015-04-27 9:15 ` Paranoid mode for RAID-1 ? Roman Mamedov
0 siblings, 2 replies; 15+ messages in thread
From: Adam Goryachev @ 2015-04-27 8:18 UTC (permalink / raw)
To: Mikael Abrahamsson, David Brown; +Cc: linux-raid
On 27/04/15 17:35, Mikael Abrahamsson wrote:
> On Mon, 27 Apr 2015, David Brown wrote:
>
>> btrfs has data checksums like that. Like Neil, I question the
>> necessity for harddisks, but such checksums are lower cost than
>> reading the data twice from two disks (as they are stored as part of
>> the metadata that you already read), and can offer some protection
>> against serious hardware problems. (Checksums like this cannot
>> easily be implemented in a transparent block device such as md raid -
>> it is more practical to have them as part of the filesystem, as done
>> with btrfs.)
>
> Only way I can imagine this being done would be for instance to add a
> 4KiB block for every 128KiB chunk or something like that, and perhaps
> have a smaller checksum for each 4KiB block within that 128KiB chunk.
>
> I doubt anyone would be interested in putting efforts into creating
> this though as it would have "interesting" performance drawbacks, and
> that work is probably better spent by making sure that btrfs and/or
> zfs gets more development/testing than it is to put that effort into
> md. I personally prefer md to be fairly "simple" so we have as few
> bugs as possible in it, I'd say that md generally works and the number
> of developers working heroically on its current incarnation is barely
> enough to make sure that the codebase works as well as it must
> considering the critical function it serves for a lot of us.
>
> This has been discussed before and nobody has shown interest in
> actually developing code for it, so we're still at the feature request
> and "brainstorming about design" state, and without actual coder(s)
> willing to actually implement, it's not going to get further than this
> stage.
>
Speaking of which, I'm not convinced that we should spend that developer
time on each and every FS (eg, duplicated effort for btrfs, zfs, and any
others that do the same). It also means you must remove MD Raid, to
allow the FS to directly access each of the underlying devices.
Obviously, there are advantages in both methods.
As you and others said, without someone willing to implement/write this
feature, then it isn't going to happen.
Regards,
Adam
--
Adam Goryachev Website Managers www.websitemanagers.com.au
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Paranoid mode for RAID-1 ? MD-RAID checksums
2015-04-27 8:18 ` Adam Goryachev
@ 2015-04-27 8:34 ` Pasi Kärkkäinen
2015-04-27 9:15 ` Paranoid mode for RAID-1 ? Roman Mamedov
1 sibling, 0 replies; 15+ messages in thread
From: Pasi Kärkkäinen @ 2015-04-27 8:34 UTC (permalink / raw)
To: Adam Goryachev; +Cc: Mikael Abrahamsson, David Brown, linux-raid
On Mon, Apr 27, 2015 at 06:18:57PM +1000, Adam Goryachev wrote:
> On 27/04/15 17:35, Mikael Abrahamsson wrote:
> >On Mon, 27 Apr 2015, David Brown wrote:
> >
> >>btrfs has data checksums like that. Like Neil, I question the
> >>necessity for harddisks, but such checksums are lower cost than
> >>reading the data twice from two disks (as they are stored as
> >>part of the metadata that you already read), and can offer some
> >>protection against serious hardware problems. (Checksums like
> >>this cannot easily be implemented in a transparent block device
> >>such as md raid - it is more practical to have them as part of
> >>the filesystem, as done with btrfs.)
> >
> >Only way I can imagine this being done would be for instance to
> >add a 4KiB block for every 128KiB chunk or something like that,
> >and perhaps have a smaller checksum for each 4KiB block within
> >that 128KiB chunk.
> >
> >I doubt anyone would be interested in putting efforts into
> >creating this though as it would have "interesting" performance
> >drawbacks, and that work is probably better spent by making sure
> >that btrfs and/or zfs gets more development/testing than it is to
> >put that effort into md. I personally prefer md to be fairly
> >"simple" so we have as few bugs as possible in it, I'd say that md
> >generally works and the number of developers working heroically on
> >its current incarnation is barely enough to make sure that the
> >codebase works as well as it must considering the critical
> >function it serves for a lot of us.
> >
> >This has been discussed before and nobody has shown interest in
> >actually developing code for it, so we're still at the feature
> >request and "brainstorming about design" state, and without actual
> >coder(s) willing to actually implement, it's not going to get
> >further than this stage.
> >
> Speaking of which, I'm not convinced that we should spend that
> developer time on each and every FS (eg, duplicated effort for
> btrfs, zfs, and any others that do the same). It also means you must
> remove MD Raid, to allow the FS to directly access each of the
> underlying devices. Obviously, there are advantages in both methods.
>
Yeah, having checksums support in MD-RAID would be very welcome!
> As you and others said, without someone willing to implement/write
> this feature, then it isn't going to happen.
>
Yeah this is the problem, someone actually needs to do it :)
There actually IS a proof-of-concept checksums support for MD-RAID,
but it was never upstreamed, and it was a quick and 'naive' implementation.
http://pages.cs.wisc.edu/~bpkroth/cs736/md-checksums/md-checksums-paper.pdf
http://pages.cs.wisc.edu/~bpkroth/cs736/md-checksums/
> Regards,
> Adam
>
-- Pasi
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Paranoid mode for RAID-1 ?
2015-04-27 6:37 ` Paranoid mode for RAID-1 ? Jean-Baptiste Thomas
2015-04-27 6:48 ` Adam Goryachev
2015-04-27 6:49 ` NeilBrown
@ 2015-04-27 8:45 ` Pieter De Wit
2015-04-27 10:18 ` <DKIM> " Jean-Baptiste Thomas
2 siblings, 1 reply; 15+ messages in thread
From: Pieter De Wit @ 2015-04-27 8:45 UTC (permalink / raw)
To: linux-raid@vger.kernel.org
Sorry for jumping in late - but let's say it does "work" and a drive returns an error, is that data lost ? Or which drive is "right"?
Sent from my iPhone
> On 27/04/2015, at 18:37, Jean-Baptiste Thomas <cau2jeaf1honoq@laposte.net> wrote:
>
> I'm looking for a way to get MD to operate in a mode in which
> reading a sector from a RAID-1 device would not succeed until it
> got matching data from at least two components.
>
> Recent experience[1] suggests that a transient problem in one
> disk can completely hose a four way RAID-1 array, which is
> otherwise supposed to still be fine after a triple total
> failure. I'm hoping that a paranoid mode would have prevented
> that.
>
> If there is such a thing, PLEASE tell me. If not, please tell me
> so I don't waste any more time looking for it.
>
> [1] "Massive RAID-1 desync"
> http://www.spinics.net/lists/raid/msg48681.html
> http://marc.info/?l=linux-raid&m=143003812706563&w=2
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Paranoid mode for RAID-1 ?
2015-04-27 8:18 ` Adam Goryachev
2015-04-27 8:34 ` Paranoid mode for RAID-1 ? MD-RAID checksums Pasi Kärkkäinen
@ 2015-04-27 9:15 ` Roman Mamedov
1 sibling, 0 replies; 15+ messages in thread
From: Roman Mamedov @ 2015-04-27 9:15 UTC (permalink / raw)
To: Adam Goryachev; +Cc: Mikael Abrahamsson, David Brown, linux-raid
[-- Attachment #1: Type: text/plain, Size: 930 bytes --]
On Mon, 27 Apr 2015 18:18:57 +1000
Adam Goryachev <mailinglists@websitemanagers.com.au> wrote:
> Speaking of which, I'm not convinced that we should spend that developer
> time on each and every FS (eg, duplicated effort for btrfs, zfs, and any
> others that do the same)
There isn't "each and every" FS, depending on whom you ask there's just one FS
that you should use. :) The filesystem can also do checksums in a smarter way,
e.g. not checksum the free space. And developers of those filesystems aren't
going to abandon their checksum support or plans to add that just because the
underlying block device MIGHT be an MD RAID array of a new weird type.
However one place where adding corruption resilience to MD can be extremely
interesting and possible almost "for free" (with no disk format change, for
example), is a constant full verification and corruption-healing RAID6.
--
With respect,
Roman
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 198 bytes --]
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: <DKIM> Re: Paranoid mode for RAID-1 ?
2015-04-27 8:45 ` Pieter De Wit
@ 2015-04-27 10:18 ` Jean-Baptiste Thomas
2015-04-27 10:54 ` David Brown
0 siblings, 1 reply; 15+ messages in thread
From: Jean-Baptiste Thomas @ 2015-04-27 10:18 UTC (permalink / raw)
To: Pieter De Wit; +Cc: linux-raid
On 2015-04-27 20:45 +1200, Pieter De Wit wrote:
> Sorry for jumping in late - but let's say it does "work" and a
> drive returns an error, is that data lost ? Or which drive is
> "right"?
(Assuming that by "returns an error", you mean succeeds but the
data does not no match what the other(s) returned.)
Let's say there is a setting for how many components must agree.
If they're not unanimous, read all the other components and look
for a majority. The components in the minority are flagged
faulty and the array is degraded but the read succeeds.
If there is no majority, retry a few times. If a majority is
found, all components which ever were in the minority are
flagged faulty and the array is degraded but the read succeeds.
If no majority is found, degrade all components, fail the read
and stop the array. Or whatever is needed to prevent all further
writes to this array and let the user investigate.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Paranoid mode for RAID-1 ?
2015-04-27 6:49 ` NeilBrown
@ 2015-04-27 10:52 ` Jean-Baptiste Thomas
2015-04-27 16:15 ` Wols Lists
0 siblings, 1 reply; 15+ messages in thread
From: Jean-Baptiste Thomas @ 2015-04-27 10:52 UTC (permalink / raw)
To: NeilBrown; +Cc: linux-raid
On 2015-04-27 16:49 +1000, NeilBrown wrote:
> On Mon, 27 Apr 2015 08:37:59 +0200 (CEST) Jean-Baptiste Thomas
> <cau2jeaf1honoq@laposte.net> wrote:
>·
> > I'm looking for a way to get MD to operate in a mode in which
> > reading a sector from a RAID-1 device would not succeed until it
> > got matching data from at least two components.
>·
> No, there is no such thing.
Thanks, now I can move on to working on plan B.
> There "should" be no circumstance which would make it worth while.
> A drive may well report an error, but it should *never* report
> incorrect data as though it were correct. That is horribly
> broken.
Isn't it. <g>
> The cost of running in a "safe" mode would be high, and the
> likely benefit extremely low. So it is unlikely that anyone
> would use it for long. So implementing it seems rather
> pointless.
How high would the cost be ?
Seems to me that a 4-component RAID-1 with a 2-component quorum
would incur no more I/O or CPU overhead than, say, a 4-component
RAID-6. Less, in fact, unless parity computation is faster than
memcmp().
Given the choice between that sort of cost and the possibility
of massive data corruption because one drive had a hiccup, I
would not even THINK about running without it.
> That said: if someone were to provide an implementation I
> would certainly consider reviewing it and adding it to md.
Great. Don't think it'll be me, though. :-/
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: <DKIM> Re: Paranoid mode for RAID-1 ?
2015-04-27 10:18 ` <DKIM> " Jean-Baptiste Thomas
@ 2015-04-27 10:54 ` David Brown
2015-04-27 12:36 ` Jean-Baptiste Thomas
0 siblings, 1 reply; 15+ messages in thread
From: David Brown @ 2015-04-27 10:54 UTC (permalink / raw)
To: Jean-Baptiste Thomas, Pieter De Wit; +Cc: linux-raid
On 27/04/15 12:18, Jean-Baptiste Thomas wrote:
> On 2015-04-27 20:45 +1200, Pieter De Wit wrote:
>
>> Sorry for jumping in late - but let's say it does "work" and a
>> drive returns an error, is that data lost ? Or which drive is
>> "right"?
>
> (Assuming that by "returns an error", you mean succeeds but the
> data does not no match what the other(s) returned.)
The alternative interpretation here is that the drive returns an error
message saying it couldn't read the sector - then it's just standard
RAID (get the data from the other disks). So we are looking here at the
extremely rare situation where there is an error but the drive (or
controller) does not detect it.
>
> Let's say there is a setting for how many components must agree.
> If they're not unanimous, read all the other components and look
> for a majority. The components in the minority are flagged
> faulty and the array is degraded but the read succeeds.
>
> If there is no majority, retry a few times. If a majority is
> found, all components which ever were in the minority are
> flagged faulty and the array is degraded but the read succeeds.
>
> If no majority is found, degrade all components, fail the read
> and stop the array. Or whatever is needed to prevent all further
> writes to this array and let the user investigate.
The problem with all of these is that they /might/ be right - but they
/might/ be wrong and make matters worse. Even if you have 3 copies of
the sector, and get two matches and one different, there is no way to
determine that the odd one is wrong. Perhaps a common bus or connector
fault caused the other two to be wrong. Picking the "majority vote" may
decrease your chances of losing data (but may not - it depends on the
cause of the fault), but it certainly does not avoid the worst case
scenario. Perhaps the best choice during normal usage (as distinct from
recovery or rebuild, when the drive is not mounted) is to simply report
a failure to the layers higher up - that way you won't make matters
worse by giving returning data.
Note that the checksum method (used by btrfs and zfs) is different in
that it lets the system know exactly which copy was bad even if the
drive (and bus and controller) think it was good.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Paranoid mode for RAID-1 ?
2015-04-27 10:54 ` David Brown
@ 2015-04-27 12:36 ` Jean-Baptiste Thomas
2015-04-27 13:46 ` David Brown
0 siblings, 1 reply; 15+ messages in thread
From: Jean-Baptiste Thomas @ 2015-04-27 12:36 UTC (permalink / raw)
To: David Brown; +Cc: linux-raid
On 2015-04-27 12:54 +0200, David Brown wrote:
> The problem with all of these is that they /might/ be right -
> but they /might/ be wrong and make matters worse. Even if you
> have 3 copies of the sector, and get two matches and one
> different, there is no way to determine that the odd one is
> wrong. Perhaps a common bus or connector fault caused the
> other two to be wrong. Picking the "majority vote" may
> decrease your chances of losing data (but may not - it depends
> on the cause of the fault), but it certainly does not avoid
> the worst case scenario.
So Neil's objection is that it's too paranoid and yours is that
it's not paranoid enough ? :-)
> Perhaps the best choice during normal usage (as distinct from
> recovery or rebuild, when the drive is not mounted) is to
> simply report a failure to the layers higher up - that way you
> won't make matters worse by giving returning data.
You may be right. The main points I think are that
a) the inconsistency be caught and reported and
b) writes be disabled before the propagation of errors buggers
up the whole file system.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Paranoid mode for RAID-1 ?
2015-04-27 12:36 ` Jean-Baptiste Thomas
@ 2015-04-27 13:46 ` David Brown
0 siblings, 0 replies; 15+ messages in thread
From: David Brown @ 2015-04-27 13:46 UTC (permalink / raw)
To: Jean-Baptiste Thomas; +Cc: linux-raid
On 27/04/15 14:36, Jean-Baptiste Thomas wrote:
> On 2015-04-27 12:54 +0200, David Brown wrote:
>
>> The problem with all of these is that they /might/ be right -
>> but they /might/ be wrong and make matters worse. Even if you
>> have 3 copies of the sector, and get two matches and one
>> different, there is no way to determine that the odd one is
>> wrong. Perhaps a common bus or connector fault caused the
>> other two to be wrong. Picking the "majority vote" may
>> decrease your chances of losing data (but may not - it depends
>> on the cause of the fault), but it certainly does not avoid
>> the worst case scenario.
>
> So Neil's objection is that it's too paranoid and yours is that
> it's not paranoid enough ? :-)
It's both - it's unlikely to be needed, and in cases where it is needed,
it's unlikely to help.
Neil has written articles about this before, which are worth reading:
<http://neil.brown.name/blog/20110227114201>
<http://neil.brown.name/blog/20100211050355>
>
>> Perhaps the best choice during normal usage (as distinct from
>> recovery or rebuild, when the drive is not mounted) is to
>> simply report a failure to the layers higher up - that way you
>> won't make matters worse by giving returning data.
>
> You may be right. The main points I think are that
> a) the inconsistency be caught and reported and
> b) writes be disabled before the propagation of errors buggers
> up the whole file system.
Yes, that all makes sense.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Paranoid mode for RAID-1 ?
2015-04-27 10:52 ` Jean-Baptiste Thomas
@ 2015-04-27 16:15 ` Wols Lists
0 siblings, 0 replies; 15+ messages in thread
From: Wols Lists @ 2015-04-27 16:15 UTC (permalink / raw)
To: Jean-Baptiste Thomas; +Cc: NeilBrown, linux-raid
On 27/04/15 11:52, Jean-Baptiste Thomas wrote:
> On 2015-04-27 16:49 +1000, NeilBrown wrote:
>> On Mon, 27 Apr 2015 08:37:59 +0200 (CEST) Jean-Baptiste Thomas
>> <cau2jeaf1honoq@laposte.net> wrote:
>> ·
>>> I'm looking for a way to get MD to operate in a mode in which
>>> reading a sector from a RAID-1 device would not succeed until it
>>> got matching data from at least two components.
>> ·
>> No, there is no such thing.
>
> Thanks, now I can move on to working on plan B.
>
>> There "should" be no circumstance which would make it worth while.
>> A drive may well report an error, but it should *never* report
>> incorrect data as though it were correct. That is horribly
>> broken.
>
> Isn't it. <g>
>
>> The cost of running in a "safe" mode would be high, and the
>> likely benefit extremely low. So it is unlikely that anyone
>> would use it for long. So implementing it seems rather
>> pointless.
>
> How high would the cost be ?
>
> Seems to me that a 4-component RAID-1 with a 2-component quorum
> would incur no more I/O or CPU overhead than, say, a 4-component
> RAID-6. Less, in fact, unless parity computation is faster than
> memcmp().
>
> Given the choice between that sort of cost and the possibility
> of massive data corruption because one drive had a hiccup, I
> would not even THINK about running without it.
Well, I've already mentioned the Pr1me technique, but imho that belongs
in the layer that actually writes to the disk. Snag is, it doubles the
required disk space so I don't know how it would fit ...
And I don't remember the maths so I can't tell you *how* it did it, but
for each byte of data it created a parity byte (which is why it doubles
the disk requirement). From that, a single-bit error was guaranteed to
tell you whether the data byte or the parity byte was wrong. If the data
was wrong, you could reconstruct it from the parity. If there was a
two-bit error, you stood a 95% chance or thereabouts of recovery.
So this isn't raid, it won't protect you against disk failure (unless
you put data and parity on separate disks, which then costs a
double-read instead), but at least a read can then return e_data_error
if something goes wrong. But you're looking at only 25% of your disk
space being "usable" for a fast mirror if you do this.
>
>> That said: if someone were to provide an implementation I
>> would certainly consider reviewing it and adding it to md.
>
> Great. Don't think it'll be me, though. :-/
Nor me neither. I'd love to try, but time is not my friend at the moment.
Cheers,
Wol
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2015-04-27 16:15 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <1610581828.22506051.1430116628556.JavaMail.zimbra@laposte.net>
2015-04-27 6:37 ` Paranoid mode for RAID-1 ? Jean-Baptiste Thomas
2015-04-27 6:48 ` Adam Goryachev
2015-04-27 7:15 ` David Brown
2015-04-27 7:35 ` Mikael Abrahamsson
2015-04-27 8:18 ` Adam Goryachev
2015-04-27 8:34 ` Paranoid mode for RAID-1 ? MD-RAID checksums Pasi Kärkkäinen
2015-04-27 9:15 ` Paranoid mode for RAID-1 ? Roman Mamedov
2015-04-27 6:49 ` NeilBrown
2015-04-27 10:52 ` Jean-Baptiste Thomas
2015-04-27 16:15 ` Wols Lists
2015-04-27 8:45 ` Pieter De Wit
2015-04-27 10:18 ` <DKIM> " Jean-Baptiste Thomas
2015-04-27 10:54 ` David Brown
2015-04-27 12:36 ` Jean-Baptiste Thomas
2015-04-27 13:46 ` David Brown
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).