raid6 rebuild

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* raid6 rebuild
@ 2007-04-04 19:46 Lennert Buytenhek
  2007-04-05  3:22 ` Dan Williams
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Lennert Buytenhek @ 2007-04-04 19:46 UTC (permalink / raw)
  To: mingo, neilb, linux-raid

(please CC on replies, not subscribed to linux-raid@)

Hi!

While my RAID6 array was rebuilding after one disk had failed (which
I replaced), a second disk failed[*], and this caused the rebuild
process to start over from the beginning.

Why would the rebuild need to start over from the beginning in this
case?  Why couldn't it just continue from where it was?

thanks,
Lennert

[*] probably an entirely defective batch of 14 Samsung Spinpoint 500G disks

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: raid6 rebuild
  2007-04-04 19:46 raid6 rebuild Lennert Buytenhek
@ 2007-04-05  3:22 ` Dan Williams
  2007-04-05  5:50   ` Lennert Buytenhek
  2007-04-05  6:03 ` Gordon Henderson
  2007-04-05 10:13 ` Andre Noll
  2 siblings, 1 reply; 12+ messages in thread
From: Dan Williams @ 2007-04-05  3:22 UTC (permalink / raw)
  To: Lennert Buytenhek; +Cc: mingo, neilb, linux-raid

On 4/4/07, Lennert Buytenhek <buytenh@wantstofly.org> wrote:
> (please CC on replies, not subscribed to linux-raid@)
>
> Hi!
>
> While my RAID6 array was rebuilding after one disk had failed (which
> I replaced), a second disk failed[*], and this caused the rebuild
> process to start over from the beginning.
>
> Why would the rebuild need to start over from the beginning in this
> case?  Why couldn't it just continue from where it was?
>
I believe it is because raid5 and raid6 share the same error handler
which sets MD_RECOVERY_ERR after losing any disk.  It should probably
not set this flag in 1-disk lost raid6 case, but I might be
overlooking something else.

>
> thanks,
> Lennert
>

Dan

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: raid6 rebuild
  2007-04-05  3:22 ` Dan Williams
@ 2007-04-05  5:50   ` Lennert Buytenhek
  2007-04-05 13:54     ` Bill Davidsen
  0 siblings, 1 reply; 12+ messages in thread
From: Lennert Buytenhek @ 2007-04-05  5:50 UTC (permalink / raw)
  To: Dan Williams; +Cc: mingo, neilb, linux-raid

On Wed, Apr 04, 2007 at 08:22:00PM -0700, Dan Williams wrote:

> >While my RAID6 array was rebuilding after one disk had failed (which
> >I replaced), a second disk failed[*], and this caused the rebuild
> >process to start over from the beginning.
> >
> >Why would the rebuild need to start over from the beginning in this
> >case?  Why couldn't it just continue from where it was?
> 
> I believe it is because raid5 and raid6 share the same error handler
> which sets MD_RECOVERY_ERR after losing any disk.  It should probably
> not set this flag in 1-disk lost raid6 case, but I might be
> overlooking something else.

Right, so you're saying that it's probably a bug rather than an
intentional 'feature'?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: raid6 rebuild
  2007-04-04 19:46 raid6 rebuild Lennert Buytenhek
  2007-04-05  3:22 ` Dan Williams
@ 2007-04-05  6:03 ` Gordon Henderson
  2007-04-05  6:21   ` Lennert Buytenhek
  2007-04-05 10:13 ` Andre Noll
  2 siblings, 1 reply; 12+ messages in thread
From: Gordon Henderson @ 2007-04-05  6:03 UTC (permalink / raw)
  To: Lennert Buytenhek; +Cc: linux-raid

On Wed, 4 Apr 2007, Lennert Buytenhek wrote:

> (please CC on replies, not subscribed to linux-raid@)
>
> Hi!
>
> While my RAID6 array was rebuilding after one disk had failed (which
> I replaced), a second disk failed[*], and this caused the rebuild
> process to start over from the beginning.
>
> Why would the rebuild need to start over from the beginning in this
> case?  Why couldn't it just continue from where it was?

I can't answer your question, just make the comment: Hurrah for RAID-6!

I presume the restarted rebuild went OK?

> [*] probably an entirely defective batch of 14 Samsung Spinpoint 500G disks

Lets hope not... Keep checking those SMART values...

Gordon

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: raid6 rebuild
  2007-04-05  6:03 ` Gordon Henderson
@ 2007-04-05  6:21   ` Lennert Buytenhek
  2007-04-05  8:15     ` Gordon Henderson
  0 siblings, 1 reply; 12+ messages in thread
From: Lennert Buytenhek @ 2007-04-05  6:21 UTC (permalink / raw)
  To: Gordon Henderson; +Cc: linux-raid

On Thu, Apr 05, 2007 at 07:03:08AM +0100, Gordon Henderson wrote:

> > While my RAID6 array was rebuilding after one disk had failed (which
> > I replaced), a second disk failed[*], and this caused the rebuild
> > process to start over from the beginning.
> >
> > Why would the rebuild need to start over from the beginning in this
> > case?  Why couldn't it just continue from where it was?
> 
> I can't answer your question, just make the comment: Hurrah for RAID-6!

Agreed, although I'm still only just in the testing phase with this
array.  It isn't in production yet, and I don't have any important data
on the array that I don't have backups of.

> I presume the restarted rebuild went OK?

Yep, it had just finished when I just woke up, so now it's back from
double-degraded to degraded.  Just in time for the third disk to start
failing, which will probably fail during the rebuild of the second
failed disk after I swap that one out in a couple of minutes..

> > [*] probably an entirely defective batch of 14 Samsung Spinpoint
> > 500G disks
> 
> Lets hope not... Keep checking those SMART values...

Failed disk #2 still reports a SMART status of "PASSED"..

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: raid6 rebuild
  2007-04-05  6:21   ` Lennert Buytenhek
@ 2007-04-05  8:15     ` Gordon Henderson
  2007-04-05  8:48       ` Lennert Buytenhek
  0 siblings, 1 reply; 12+ messages in thread
From: Gordon Henderson @ 2007-04-05  8:15 UTC (permalink / raw)
  To: Lennert Buytenhek; +Cc: linux-raid

On Thu, 5 Apr 2007, Lennert Buytenhek wrote:

>>> [*] probably an entirely defective batch of 14 Samsung Spinpoint
>>> 500G disks
>>
>> Lets hope not... Keep checking those SMART values...
>
> Failed disk #2 still reports a SMART status of "PASSED"..

Hmmm.. I'd be tempted to double check your hardware + kernel versions then 
- I did have an older SCSI array on an older server (dual 500Mhz Xeons!) 
give me the occasional sector read error, but when going back and using 
badblocks (read only) on the disk, it checked out just fine... I never got 
to the bottom of it, but it only started happening when I went from a 2.4 
kernel where it was configures as 2 x 4-disk RAID-5 arrays to a 2.6 one 
where it was configures as a single 8 drive RAID-6 array with the new 
Adaptec SCSI driver... (My suspicions were that of old hardware combined 
with new drivers and a slight motherboard timing issue, as the R6 code 
might well drive the underlying hardware "harder" than when it was in dual 
R5 mode, but it was retired before it became a serious issue)

Gordon

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: raid6 rebuild
  2007-04-05  8:15     ` Gordon Henderson
@ 2007-04-05  8:48       ` Lennert Buytenhek
  0 siblings, 0 replies; 12+ messages in thread
From: Lennert Buytenhek @ 2007-04-05  8:48 UTC (permalink / raw)
  To: Gordon Henderson; +Cc: linux-raid

On Thu, Apr 05, 2007 at 09:15:38AM +0100, Gordon Henderson wrote:

> >>>[*] probably an entirely defective batch of 14 Samsung Spinpoint
> >>>500G disks
> >>
> >>Lets hope not... Keep checking those SMART values...
> >
> >Failed disk #2 still reports a SMART status of "PASSED"..
> 
> Hmmm.. I'd be tempted to double check your hardware + kernel versions
> then 

I think in this case it's more of a disagreement between me and the
disk about whether certain levels and patterns of bad sectors are
acceptable or not.

dd'ing the disk to /dev/null keeps reporting new UncorrectableError
sectors, which overwriting with zeroes then supposedly fixes (judging
by the increasing Reallocated_Sector_Ct), but a subsequent read from
the very same sectors reports the very same UncorrectableError errors
again, plus new errors from other sectors.

The SMART short offline tests also report "Completed: read failure".

Meanwhile, the disk still feels that that its SMART status is PASSED.

(FYI, these are Samsung Spinpoint T166S HD501LJ disks with serial
numbers in the range S0VVJ1KP3xxxxx)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: raid6 rebuild
  2007-04-04 19:46 raid6 rebuild Lennert Buytenhek
  2007-04-05  3:22 ` Dan Williams
  2007-04-05  6:03 ` Gordon Henderson
@ 2007-04-05 10:13 ` Andre Noll
  2 siblings, 0 replies; 12+ messages in thread
From: Andre Noll @ 2007-04-05 10:13 UTC (permalink / raw)
  To: Lennert Buytenhek; +Cc: mingo, neilb, linux-raid

[-- Attachment #1: Type: text/plain, Size: 1181 bytes --]

On 21:46, Lennert Buytenhek wrote:

> While my RAID6 array was rebuilding after one disk had failed (which
> I replaced), a second disk failed[*], and this caused the rebuild
> process to start over from the beginning.
> 
> Why would the rebuild need to start over from the beginning in this
> case?  Why couldn't it just continue from where it was?

I think this is because an easier algorithm may be used to rebuild from
a one disk failure than what is needed in case of a two disk failure.
In case of a one disk failure you can simply do the following:

rebuild P or Q: Just recompute it from the data disks
rebuild data: xor remaining data and P as in raid5 (in fact, the raid5
code is used in this case)

It's only the rebuild from a two disk failure that requires deeper
and more expensive math.

So if a second disk dies during the rebuild, the raid6 code must
switch from the easy algorithm to the "difficult" algorithm to finish
rebuilding the first failed disk. Theoretically this could be done
on the fly, but I think the current code doesn't do this now.

Andre
-- 
The only person who always got his work done by Friday was Robinson Crusoe

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: raid6 rebuild
  2007-04-05  5:50   ` Lennert Buytenhek
@ 2007-04-05 13:54     ` Bill Davidsen
  2007-04-05 14:06       ` Lennert Buytenhek
  0 siblings, 1 reply; 12+ messages in thread
From: Bill Davidsen @ 2007-04-05 13:54 UTC (permalink / raw)
  To: Lennert Buytenhek; +Cc: Dan Williams, mingo, neilb, linux-raid

Lennert Buytenhek wrote:
> On Wed, Apr 04, 2007 at 08:22:00PM -0700, Dan Williams wrote:
>
>   
>>> While my RAID6 array was rebuilding after one disk had failed (which
>>> I replaced), a second disk failed[*], and this caused the rebuild
>>> process to start over from the beginning.
>>>
>>> Why would the rebuild need to start over from the beginning in this
>>> case?  Why couldn't it just continue from where it was?
>>>       
>> I believe it is because raid5 and raid6 share the same error handler
>> which sets MD_RECOVERY_ERR after losing any disk.  It should probably
>> not set this flag in 1-disk lost raid6 case, but I might be
>> overlooking something else.
>>     
>
> Right, so you're saying that it's probably a bug rather than an
> intentional 'feature'?

No, I would say it's a "reviewable design decision" To be pedantic (as I 
often am), a bug really means that unintended results are generated. In 
this case I think the code functions as intended, but might be able to 
safely take some other action.

I confess, I would feel safer with my data if the rebuild started over, 
I would like to be sure that when it (finally) finishes the data are 
valid. If you replaced the 2nd drive, then a full rebuild would be 
required in any case, to get ALL drives valid.

-- 
bill davidsen <davidsen@tmr.com>
  CTO TMR Associates, Inc
  Doing interesting things with small computers since 1979


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: raid6 rebuild
  2007-04-05 13:54     ` Bill Davidsen
@ 2007-04-05 14:06       ` Lennert Buytenhek
  2007-04-05 16:59         ` Dan Williams
  0 siblings, 1 reply; 12+ messages in thread
From: Lennert Buytenhek @ 2007-04-05 14:06 UTC (permalink / raw)
  To: Bill Davidsen; +Cc: Dan Williams, mingo, neilb, linux-raid

On Thu, Apr 05, 2007 at 09:54:14AM -0400, Bill Davidsen wrote:

> I confess, I would feel safer with my data if the rebuild started
> over, I would like to be sure that when it (finally) finishes the
> data are valid.

With disk #3 about to die, I'd have felt safer if it first finished
rebuilding the replacement disk for failed disk #1 (that rebuild had
almost completed at that point), safeguarding the array against a
third disk failure.

Yeah, I know, you're not supposed to lose three disks within one day.


> If you replaced the 2nd drive, then a full rebuild would be required
> in any case, to get ALL drives valid.

You could finish the last little bit of the resync of the replacement
for failed disk #1 (by looking at P+Q), and then re-sync the replacement
for failed disk #2 (by looking only at P..)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: raid6 rebuild
  2007-04-05 14:06       ` Lennert Buytenhek
@ 2007-04-05 16:59         ` Dan Williams
  2007-04-11  1:43           ` Neil Brown
  0 siblings, 1 reply; 12+ messages in thread
From: Dan Williams @ 2007-04-05 16:59 UTC (permalink / raw)
  To: Lennert Buytenhek; +Cc: Bill Davidsen, mingo, neilb, linux-raid

On 4/5/07, Lennert Buytenhek <buytenh@wantstofly.org> wrote:
> On Thu, Apr 05, 2007 at 09:54:14AM -0400, Bill Davidsen wrote:
>
> > I confess, I would feel safer with my data if the rebuild started
> > over, I would like to be sure that when it (finally) finishes the
> > data are valid.
>
> With disk #3 about to die, I'd have felt safer if it first finished
> rebuilding the replacement disk for failed disk #1 (that rebuild had
> almost completed at that point), safeguarding the array against a
> third disk failure.
>
I agree, the current arrangement seems to throw away a significant
amount of work.  Yes, you will need to resync when re-adding the
second disk, but in the meantime might as well try to get a redundant
mode at all costs.

> Yeah, I know, you're not supposed to lose three disks within one day.
>
>
> > If you replaced the 2nd drive, then a full rebuild would be required
> > in any case, to get ALL drives valid.
>
> You could finish the last little bit of the resync of the replacement
> for failed disk #1 (by looking at P+Q), and then re-sync the replacement
> for failed disk #2 (by looking only at P..)
> -

--
Dan

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: raid6 rebuild
  2007-04-05 16:59         ` Dan Williams
@ 2007-04-11  1:43           ` Neil Brown
  0 siblings, 0 replies; 12+ messages in thread
From: Neil Brown @ 2007-04-11  1:43 UTC (permalink / raw)
  To: Dan Williams; +Cc: Lennert Buytenhek, Bill Davidsen, mingo, linux-raid

On Thursday April 5, dan.j.williams@intel.com wrote:
> On 4/5/07, Lennert Buytenhek <buytenh@wantstofly.org> wrote:
> > On Thu, Apr 05, 2007 at 09:54:14AM -0400, Bill Davidsen wrote:
> >
> > > I confess, I would feel safer with my data if the rebuild started
> > > over, I would like to be sure that when it (finally) finishes the
> > > data are valid.
> >
> > With disk #3 about to die, I'd have felt safer if it first finished
> > rebuilding the replacement disk for failed disk #1 (that rebuild had
> > almost completed at that point), safeguarding the array against a
> > third disk failure.
> >
> I agree, the current arrangement seems to throw away a significant
> amount of work.  Yes, you will need to resync when re-adding the
> second disk, but in the meantime might as well try to get a redundant
> mode at all costs.

Yes, I think you are right.
If you want it to restart from the beginning you can alway abort the
current resync by 'echo idle > sync_action'.
The question is: is it really as simple to do as it sounds.
I seem to remember that aborting the recovery on any error was any
easy way to avoid some nasty race, but I have no idea what the race
was.
One would need the enumerate all the interesting cases and make sure
they will all work as expected.  I cannot think of an problems
immediately but that doesn't mean there aren't any...

It is now on my todo list...

NeilBrown

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2007-04-11  1:43 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-04-04 19:46 raid6 rebuild Lennert Buytenhek
2007-04-05  3:22 ` Dan Williams
2007-04-05  5:50   ` Lennert Buytenhek
2007-04-05 13:54     ` Bill Davidsen
2007-04-05 14:06       ` Lennert Buytenhek
2007-04-05 16:59         ` Dan Williams
2007-04-11  1:43           ` Neil Brown
2007-04-05  6:03 ` Gordon Henderson
2007-04-05  6:21   ` Lennert Buytenhek
2007-04-05  8:15     ` Gordon Henderson
2007-04-05  8:48       ` Lennert Buytenhek
2007-04-05 10:13 ` Andre Noll

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).