Requesting migrate device options for raid5/6

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Requesting migrate device options for raid5/6
@ 2007-10-29 16:19 Goswin von Brederlow
  2007-11-01 14:51 ` Bill Davidsen
  0 siblings, 1 reply; 3+ messages in thread
From: Goswin von Brederlow @ 2007-10-29 16:19 UTC (permalink / raw)
  To: linux-raid

Hi,

I would welcome if someone could work on a new feature for raid5/6
that would allow replacing a disk in a raid5/6 with a new one without
having to degrade the array.

Consider the following situation:

raid5 md0 : sda sdb sdc

Now sda gives a "SMART - failure iminent" warning and you want to
repalce it with sdd.

% mdadm --fail /dev/md0 /dev/sda
% mdadm --remove /dev/md0 /dev/sda
% mdadm --add /dev/md0 /dev/sdd

Further consider that drive sdb will give an I/O error during resync
of the array or fail completly. The array is in degraded mode so you
experience data loss.

But that is completly avoidable and some hardware raids support disk
migration too. Loosly speaking the kernel should do the following:

raid5 md0 : sda sdb sdc
-> create internal raid1 or dm-mirror
raid1 mdT : sda
raid5 md0 : mdT sdb sdc
-> hot add sdd to mdT
raid1 mdT : sda sdd
raid5 md0 : mdT sdb sdc
-> resync and then drop sda
raid1 mdT : sdd
raid5 md0 : mdT sdb sdc
-> remove internal mirror
raid5 md0 : sdd sdb sdc 

Thoughts?

MfG
        Goswin

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Requesting migrate device options for raid5/6
  2007-10-29 16:19 Requesting migrate device options for raid5/6 Goswin von Brederlow
@ 2007-11-01 14:51 ` Bill Davidsen
  2007-11-07  8:37   ` Goswin von Brederlow
  0 siblings, 1 reply; 3+ messages in thread
From: Bill Davidsen @ 2007-11-01 14:51 UTC (permalink / raw)
  To: Goswin von Brederlow; +Cc: linux-raid

Goswin von Brederlow wrote:
> Hi,
>
> I would welcome if someone could work on a new feature for raid5/6
> that would allow replacing a disk in a raid5/6 with a new one without
> having to degrade the array.
>
> Consider the following situation:
>
> raid5 md0 : sda sdb sdc
>
> Now sda gives a "SMART - failure iminent" warning and you want to
> repalce it with sdd.
>
> % mdadm --fail /dev/md0 /dev/sda
> % mdadm --remove /dev/md0 /dev/sda
> % mdadm --add /dev/md0 /dev/sdd
>
> Further consider that drive sdb will give an I/O error during resync
> of the array or fail completly. The array is in degraded mode so you
> experience data loss.
>
>   
That's a two drive failure, so you will lose data.
> But that is completly avoidable and some hardware raids support disk
> migration too. Loosly speaking the kernel should do the following:
>
>   
No, it's not "completly avoidable" because have described sda is ready 
to fail and sdb as "will give an I/O error" so if both happen at once 
you will lose data because you have no valid copy. That said, some of 
what you describe below is possible to *reduce* the probability of 
failure. But if sdb is going to have i/o errors, you really need to 
replace two drive :-(
See below for some thoughts.
> raid5 md0 : sda sdb sdc
> -> create internal raid1 or dm-mirror
> raid1 mdT : sda
> raid5 md0 : mdT sdb sdc
> -> hot add sdd to mdT
> raid1 mdT : sda sdd
> raid5 md0 : mdT sdb sdc
> -> resync and then drop sda
> raid1 mdT : sdd
> raid5 md0 : mdT sdb sdc
> -> remove internal mirror
> raid5 md0 : sdd sdb sdc 
>
>
> Thoughts?
>   

If there were a "migrate" option, it might work something like this:
Given a migrate from sda to sdd, as you noted and raid1 between sda and 
sdd needs to be created, and obviously all chunks of sdd need to be 
marked as needing rebuild, but in addition sda needs to be made 
read-only, to minimize the i/o and to prevent any errors which might 
come from a failed write, like failed sector relocates, etc. Also, if 
valid data for a chunk is on sdd, no read would be done to sda. I think 
there's relevant code in the "write-mostly" bits to implement keep i/o 
to sda to a minimum, no writes and only mandatory reads when no valid 
chunk is on sdd yet. This is similar to recovery to a spare, save that 
most data will be valid on the failing drive and doesn't need to be 
recreated, only unreadable data must be done the slow way.

Care is needed for sda as well, so that if sdd fails during migrate, a 
last chance attempt to bring sda back to useful content can be made, I'm 
paranoid that way.

Assuming the migrate works correctly, sda is removed from the array, and 
the superblock should be marked to reflect that. Now sdd is a part of 
the array, and assemble, at least using UUID, should work.

I personally think that a migrate capability would be vastly useful, 
both for handling failing drives and just moving data to a better place. 
As you point out, the user commands are not *quite* as robust as an 
internal implementation could be, and are complex enough to invite user 
error. I certainly always write down steps before doing migrate, and if 
possible do it with the system booted from a rescue media.

-- 
bill davidsen <davidsen@tmr.com>
  CTO TMR Associates, Inc
  Doing interesting things with small computers since 1979

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Requesting migrate device options for raid5/6
  2007-11-01 14:51 ` Bill Davidsen
@ 2007-11-07  8:37   ` Goswin von Brederlow
  0 siblings, 0 replies; 3+ messages in thread
From: Goswin von Brederlow @ 2007-11-07  8:37 UTC (permalink / raw)
  To: Bill Davidsen; +Cc: Goswin von Brederlow, linux-raid

Bill Davidsen <davidsen@tmr.com> writes:

> Goswin von Brederlow wrote:
>> Hi,
>>
>> I would welcome if someone could work on a new feature for raid5/6
>> that would allow replacing a disk in a raid5/6 with a new one without
>> having to degrade the array.
>>
>> Consider the following situation:
>>
>> raid5 md0 : sda sdb sdc
>>
>> Now sda gives a "SMART - failure iminent" warning and you want to
>> repalce it with sdd.
>>
>> % mdadm --fail /dev/md0 /dev/sda
>> % mdadm --remove /dev/md0 /dev/sda
>> % mdadm --add /dev/md0 /dev/sdd
>>
>> Further consider that drive sdb will give an I/O error during resync
>> of the array or fail completly. The array is in degraded mode so you
>> experience data loss.
>>
>>
> That's a two drive failure, so you will lose data.
>> But that is completly avoidable and some hardware raids support disk
>> migration too. Loosly speaking the kernel should do the following:
>>
>>
> No, it's not "completly avoidable" because have described sda is ready
> to fail and sdb as "will give an I/O error" so if both happen at once
> you will lose data because you have no valid copy. That said, some of

But sda has not failed _yet_. I just suspect it will. As long as it
doesn't actualy fail it can compensate for sdb failing. The problem is
that you had to remove sda to replace it despite it still working.

> what you describe below is possible to *reduce* the probability of
> failure. But if sdb is going to have i/o errors, you really need to
> replace two drive :-(
> See below for some thoughts.
>> raid5 md0 : sda sdb sdc
>> -> create internal raid1 or dm-mirror
>> raid1 mdT : sda
>> raid5 md0 : mdT sdb sdc
>> -> hot add sdd to mdT
>> raid1 mdT : sda sdd
>> raid5 md0 : mdT sdb sdc
>> -> resync and then drop sda
>> raid1 mdT : sdd
>> raid5 md0 : mdT sdb sdc
>> -> remove internal mirror
>> raid5 md0 : sdd sdb sdc
>>
>>
>> Thoughts?
>>
>
> If there were a "migrate" option, it might work something like this:
> Given a migrate from sda to sdd, as you noted and raid1 between sda
> and sdd needs to be created, and obviously all chunks of sdd need to
> be marked as needing rebuild, but in addition sda needs to be made
> read-only, to minimize the i/o and to prevent any errors which might
> come from a failed write, like failed sector relocates, etc. Also, if
> valid data for a chunk is on sdd, no read would be done to sda. I
> think there's relevant code in the "write-mostly" bits to implement
> keep i/o to sda to a minimum, no writes and only mandatory reads when
> no valid chunk is on sdd yet. This is similar to recovery to a spare,
> save that most data will be valid on the failing drive and doesn't
> need to be recreated, only unreadable data must be done the slow way.

It would be nice to reduce the load on sda as much as possible if it
is suspect of failing soon. But that is rather an optimization for the
case I described. To keep things simple lets assume sda will be just
fine. So we just setup a raid1 over sda/sdd and do a rebuild. All
reads can go to sda, all writes go to both disks. If sda gives an
error then the raid1 can fail completly, if sdd gives an error kick it
from the raid1. Just like with a normal raid1.

Consider the case of wanting to regulary migrate data from one disk to
the spare disk so all disks age the same. Say every 3 month you
migrate a disk to make a different disk the hot-spare. You wouldn't
want extra considerations for the "to be spare" disk. It is not
suspected to fail soon.

> Care is needed for sda as well, so that if sdd fails during migrate, a
> last chance attempt to bring sda back to useful content can be made,
> I'm paranoid that way.
>
> Assuming the migrate works correctly, sda is removed from the array,
> and the superblock should be marked to reflect that. Now sdd is a part
> of the array, and assemble, at least using UUID, should work.
>
> I personally think that a migrate capability would be vastly useful,
> both for handling failing drives and just moving data to a better

For actually failing drives (say it developes bad blocks but is mostly
still intact) it would be usefull if special care is taken. A read
error on sda should recompute the parity for the block and write it to
sdd. But I would be fine with kicking out sda if it actually fails. We
don't have a raid mode in the kernel to cope with a raid1 where one
mirror is flaky. It would need some new coding in the bitmap for that.

> place. As you point out, the user commands are not *quite* as robust
> as an internal implementation could be, and are complex enough to
> invite user error. I certainly always write down steps before doing
> migrate, and if possible do it with the system booted from a rescue
> media.

The problem with the userspace commands is that you can't do that
live. You have to stop the raid to set up the mirroring. Unless you
always run your raid with device mapper mapped drives just in case you
want to migrate in the future.

I want this in the kernel so that you can take just any running
raid5/6 and migrate. No downtime, no device mapper preparations
beforehand.

MfG
        Goswin

PS: Some customers swap out drives in a raid when their waranty
expires even if they work perfectly. That is another use case.

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2007-11-07  8:37 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-10-29 16:19 Requesting migrate device options for raid5/6 Goswin von Brederlow
2007-11-01 14:51 ` Bill Davidsen
2007-11-07  8:37   ` Goswin von Brederlow

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).