linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: raid6 check/repair
@ 2007-11-21 13:45 Thiemo Nagel
  2007-12-14 15:25 ` Thiemo Nagel
  0 siblings, 1 reply; 20+ messages in thread
From: Thiemo Nagel @ 2007-11-21 13:45 UTC (permalink / raw)
  To: neilb, linux-raid

Dear Neal,

 >> I have been looking a bit at the check/repair functionality in the
 >> raid6 personality.
 >>
 >> It seems that if an inconsistent stripe is found during repair, md
 >> does not try to determine which block is corrupt (using e.g. the
 >> method in section 4 of HPA's raid6 paper), but just recomputes the
 >> parity blocks - i.e. the same way as inconsistent raid5 stripes are
 >> handled.
 >>
 >> Correct?
 >
 > Correct!
 >
 > The mostly likely cause of parity being incorrect is if a write to
 > data + P + Q was interrupted when one or two of those had been
 > written, but the other had not.
 >
 > No matter which was or was not written, correctly P and Q will produce
 > a 'correct' result, and it is simple.  I really don't see any
 > justification for being more clever.

My opinion about that is quite different.  Speaking just for myself:

a) When I put my data on a RAID running on Linux, I'd expect the 
software to do everything which is possible to protect and when 
necessary to restore data integrity.  (This expectation was one of the 
reasons why I chose software RAID with Linux.)

b) As a consequence of a):  When I'm using a RAID level that has extra 
redundancy, I'd expect Linux to make use of that extra redundancy during 
a 'repair'.  (Otherwise I'd consider repair a misnomer and rather call 
it 'recalc parity'.)

c) Why should 'repair' be implemented in a way that only works in most 
cases when there exists a solution that works in all cases?  (After all, 
possibilities for corruption are many, e.g. bad RAM, bad cables, chipset 
bugs, driver bugs, last but not least human mistake.  From all these 
errors I'd like to be able to recover gracefully without putting the 
array at risk by removing and readding a component device.)

Bottom line:  So far I was talking about *my* expectations, is it 
reasonable to assume that it is shared by others?  Are there any 
arguments that I'm not aware of speaking against an improved 
implementation of 'repair'?

BTW:  I just checked, it's the same for RAID 1:  When I intentionally 
corrupt a sector in the first device of a set of 16, 'repair' copies the 
corrupted data to the 15 remaining devices instead of restoring the 
correct sector from one of the other fifteen devices to the first.

Thank you for your time.

Kind regards,

Thiemo Nagel

P.S.:  I've re-sent this mail as the first one didn't get through 
majordomo.  (Yes, it had a vcard attached.  Yes, I have been told.  Yes, 
I am sorry.)

^ permalink raw reply	[flat|nested] 20+ messages in thread
* Re: raid6 check/repair
@ 2007-11-21 13:25 Thiemo Nagel
  2007-11-22  3:55 ` Neil Brown
  0 siblings, 1 reply; 20+ messages in thread
From: Thiemo Nagel @ 2007-11-21 13:25 UTC (permalink / raw)
  To: neilb; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 2322 bytes --]

Dear Neal,

>> I have been looking a bit at the check/repair functionality in the
>> raid6 personality.
>> 
>> It seems that if an inconsistent stripe is found during repair, md
>> does not try to determine which block is corrupt (using e.g. the
>> method in section 4 of HPA's raid6 paper), but just recomputes the
>> parity blocks - i.e. the same way as inconsistent raid5 stripes are
>> handled.
>> 
>> Correct?
> 
> Correct!
> 
> The mostly likely cause of parity being incorrect is if a write to
> data + P + Q was interrupted when one or two of those had been
> written, but the other had not.
> 
> No matter which was or was not written, correctly P and Q will produce
> a 'correct' result, and it is simple.  I really don't see any
> justification for being more clever.

My opinion about that is quite different.  Speaking just for myself:

a) When I put my data on a RAID running on Linux, I'd expect the 
software to do everything which is possible to protect and when 
necessary to restore data integrity.  (This expectation was one of the 
reasons why I chose software RAID with Linux.)

b) As a consequence of a):  When I'm using a RAID level that has extra 
redundancy, I'd expect Linux to make use of that extra redundancy during 
a 'repair'.  (Otherwise I'd consider repair a misnomer and rather call 
it 'recalc parity'.)

c) Why should 'repair' be implemented in a way that only works in most 
cases when there exists a solution that works in all cases?  (After all, 
possibilities for corruption are many, e.g. bad RAM, bad cables, chipset 
bugs, driver bugs, last but not least human mistake.  From all these 
errors I'd like to be able to recover gracefully without putting the 
array at risk by removing and readding a component device.)

Bottom line:  So far I was talking about *my* expectations, is it 
reasonable to assume that it is shared by others?  Are there any 
arguments that I'm not aware of speaking against an improved 
implementation of 'repair'?

BTW:  I just checked, it's the same for RAID 1:  When I intentionally 
corrupt a sector in the first device of a set of 16, 'repair' copies the 
corrupted data to the 15 remaining devices instead of restoring the 
correct sector from one of the other fifteen devices to the first.

Thank you for your time.

Kind regards,

Thiemo Nagel

[-- Attachment #2: thiemo_nagel.vcf --]
[-- Type: text/x-vcard, Size: 328 bytes --]

begin:vcard
fn:Thiemo Nagel
n:Nagel;Thiemo
org;quoted-printable:Technische Universit=C3=A4t M=C3=BCnchen;Physik Department E18
adr;quoted-printable:;;James-Franck-Stra=C3=9Fe;Garching;;85748;Germany
email;internet:thiemo.nagel@ph.tum.de
title:Dipl. Phys.
tel;work:+49 (0)89 289-12592
x-mozilla-html:FALSE
version:2.1
end:vcard


^ permalink raw reply	[flat|nested] 20+ messages in thread
* raid6 check/repair
@ 2007-11-15 15:28 Leif Nixon
  2007-11-16  4:26 ` Neil Brown
  0 siblings, 1 reply; 20+ messages in thread
From: Leif Nixon @ 2007-11-15 15:28 UTC (permalink / raw)
  To: linux-raid

Hi,

I have been looking a bit at the check/repair functionality in the
raid6 personality.

It seems that if an inconsistent stripe is found during repair, md
does not try to determine which block is corrupt (using e.g. the
method in section 4 of HPA's raid6 paper), but just recomputes the
parity blocks - i.e. the same way as inconsistent raid5 stripes are
handled.

Correct?

-- 
Leif Nixon                       -            Systems expert
------------------------------------------------------------
National Supercomputer Centre    -      Linkoping University
------------------------------------------------------------

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2007-12-14 15:25 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-11-21 13:45 raid6 check/repair Thiemo Nagel
2007-12-14 15:25 ` Thiemo Nagel
  -- strict thread matches above, loose matches on Subject: below --
2007-11-21 13:25 Thiemo Nagel
2007-11-22  3:55 ` Neil Brown
2007-11-22 16:51   ` Thiemo Nagel
2007-11-27  5:08     ` Bill Davidsen
2007-11-29  6:04       ` Neil Brown
2007-11-29  6:01     ` Neil Brown
2007-11-29 19:30       ` Bill Davidsen
2007-11-29 23:17       ` Eyal Lebedinsky
2007-11-30 14:42         ` Thiemo Nagel
     [not found]           ` <1196650421.14411.10.camel@elara.tcw.local>
     [not found]             ` <47546019.5030300@ph.tum.de>
2007-12-04 21:07               ` Peter Grandi
2007-12-05  6:53                 ` Mikael Abrahamsson
2007-12-05  9:00                 ` Leif Nixon
2007-12-05 20:31                 ` Bill Davidsen
2007-12-06 18:27                   ` Andre Noll
2007-12-07 17:34                   ` Gabor Gombas
2007-11-30 18:34       ` Thiemo Nagel
2007-11-15 15:28 Leif Nixon
2007-11-16  4:26 ` Neil Brown

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).