* proactive-raid-disk-replacement
@ 2006-09-08 8:48 Michael Tokarev
2006-09-08 9:24 ` proactive-raid-disk-replacement dean gaudet
2006-09-10 0:02 ` proactive-raid-disk-replacement Bodo Thiesen
0 siblings, 2 replies; 9+ messages in thread
From: Michael Tokarev @ 2006-09-08 8:48 UTC (permalink / raw)
To: Linux RAID
Recently Dean Gaudet, in thread titled 'Feature
Request/Suggestion - "Drive Linking"', mentioned his
document, http://arctic.org/~dean/proactive-raid5-disk-replacement.txt
I've read it, and have some umm.. concerns. Here's why:
....
> mdadm -Gb internal --bitmap-chunk=1024 /dev/md4
> mdadm /dev/md4 -r /dev/sdh1
> mdadm /dev/md4 -f /dev/sde1 -r /dev/sde1
> mdadm --build /dev/md5 -ayes --level=1 --raid-devices=2 /dev/sde1 missing
> mdadm /dev/md4 --re-add /dev/md5
> mdadm /dev/md5 -a /dev/sdh1
>
> ... wait a few hours for md5 resync...
And here's the problem. While new disk, sdh1, are resynced from
old, probably failing disk sde1, chances are high that there will
be an unreadable block on sde1. And this means the whole thing
will not work -- md5 initially contained one working drive (sde1)
and one spare (sdh1) which is being converted (resynced) to working
disk. But after read error on sde1, md5 will contain one failed
drive and one spare -- for raid1 it's fatal combination.
While at the same time, it's perfectly easy to reconstruct this
failing block from other component devices of md4.
That to say: this way of replacing disk in a software raid array
isn't much better than just removing old drive and adding new one.
And if the drive you're replacing is failing (according to SMART
for example), this method is more likely to fail.
/mjt
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: proactive-raid-disk-replacement
2006-09-08 8:48 proactive-raid-disk-replacement Michael Tokarev
@ 2006-09-08 9:24 ` dean gaudet
2006-09-08 10:47 ` proactive-raid-disk-replacement Michael Tokarev
2006-09-10 0:02 ` proactive-raid-disk-replacement Bodo Thiesen
1 sibling, 1 reply; 9+ messages in thread
From: dean gaudet @ 2006-09-08 9:24 UTC (permalink / raw)
To: Michael Tokarev; +Cc: Linux RAID
On Fri, 8 Sep 2006, Michael Tokarev wrote:
> Recently Dean Gaudet, in thread titled 'Feature
> Request/Suggestion - "Drive Linking"', mentioned his
> document, http://arctic.org/~dean/proactive-raid5-disk-replacement.txt
>
> I've read it, and have some umm.. concerns. Here's why:
>
> ....
> > mdadm -Gb internal --bitmap-chunk=1024 /dev/md4
> > mdadm /dev/md4 -r /dev/sdh1
> > mdadm /dev/md4 -f /dev/sde1 -r /dev/sde1
> > mdadm --build /dev/md5 -ayes --level=1 --raid-devices=2 /dev/sde1 missing
> > mdadm /dev/md4 --re-add /dev/md5
> > mdadm /dev/md5 -a /dev/sdh1
> >
> > ... wait a few hours for md5 resync...
>
> And here's the problem. While new disk, sdh1, are resynced from
> old, probably failing disk sde1, chances are high that there will
> be an unreadable block on sde1. And this means the whole thing
> will not work -- md5 initially contained one working drive (sde1)
> and one spare (sdh1) which is being converted (resynced) to working
> disk. But after read error on sde1, md5 will contain one failed
> drive and one spare -- for raid1 it's fatal combination.
>
> While at the same time, it's perfectly easy to reconstruct this
> failing block from other component devices of md4.
this statement is an argument for native support for this type of activity
in md itself.
> That to say: this way of replacing disk in a software raid array
> isn't much better than just removing old drive and adding new one.
hmm... i'm not sure i agree. in your proposal you're guaranteed to have
no redundancy while you wait for the new disk to sync in the raid5.
in my proposal the probability that you'll retain redundancy through the
entire process is non-zero. we can debate how non-zero it is, but
non-zero is greater than zero.
i'll admit it depends a heck of a lot on how long you wait to replace your
disks, but i prefer to replace mine well before they get to the point
where just reading the entire disk is guaranteed to result in problems.
> And if the drive you're replacing is failing (according to SMART
> for example), this method is more likely to fail.
my practice is to run regular SMART long self tests, which tend to find
Current_Pending_Sectors (which are generally read errors waiting to
happen) and then launch a "repair" sync action... that generally drops the
Current_Pending_Sector back to zero. either through a realloc or just
simply rewriting the block. if it's a realloc then i consider if there's
enough of them to warrant replacing the disk...
so for me the chances of a read error while doing the raid1 thing aren't
as high as they could be...
but yeah you've convinced me this solution isn't good enough.
-dean
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: proactive-raid-disk-replacement
2006-09-08 9:24 ` proactive-raid-disk-replacement dean gaudet
@ 2006-09-08 10:47 ` Michael Tokarev
2006-09-08 18:44 ` proactive-raid-disk-replacement dean gaudet
0 siblings, 1 reply; 9+ messages in thread
From: Michael Tokarev @ 2006-09-08 10:47 UTC (permalink / raw)
To: dean gaudet; +Cc: Linux RAID
dean gaudet wrote:
> On Fri, 8 Sep 2006, Michael Tokarev wrote:
>
>> Recently Dean Gaudet, in thread titled 'Feature
>> Request/Suggestion - "Drive Linking"', mentioned his
>> document, http://arctic.org/~dean/proactive-raid5-disk-replacement.txt
>>
>> I've read it, and have some umm.. concerns. Here's why:
>>
>> ....
>>> mdadm -Gb internal --bitmap-chunk=1024 /dev/md4
By the way, don't specify bitmap-chunk for internal bitmap.
It's needed for file-based (external) bitmap. With internal
bitmap, we have fixed size in superblock for it, so bitmap-chunk
is determined by dividing that size by size of the array.
>>> mdadm /dev/md4 -r /dev/sdh1
>>> mdadm /dev/md4 -f /dev/sde1 -r /dev/sde1
>>> mdadm --build /dev/md5 -ayes --level=1 --raid-devices=2 /dev/sde1 missing
>>> mdadm /dev/md4 --re-add /dev/md5
>>> mdadm /dev/md5 -a /dev/sdh1
>>>
>>> ... wait a few hours for md5 resync...
>> And here's the problem. While new disk, sdh1, are resynced from
>> old, probably failing disk sde1, chances are high that there will
>> be an unreadable block on sde1. And this means the whole thing
>> will not work -- md5 initially contained one working drive (sde1)
>> and one spare (sdh1) which is being converted (resynced) to working
>> disk. But after read error on sde1, md5 will contain one failed
>> drive and one spare -- for raid1 it's fatal combination.
>>
>> While at the same time, it's perfectly easy to reconstruct this
>> failing block from other component devices of md4.
>
> this statement is an argument for native support for this type of activity
> in md itself.
Yes, definitely.
>> That to say: this way of replacing disk in a software raid array
>> isn't much better than just removing old drive and adding new one.
>
> hmm... i'm not sure i agree. in your proposal you're guaranteed to have
> no redundancy while you wait for the new disk to sync in the raid5.
It's not a proposal per se, it's just another possible way (used by majority
of users I think, because it's way simpler ;)
> in my proposal the probability that you'll retain redundancy through the
> entire process is non-zero. we can debate how non-zero it is, but
> non-zero is greater than zero.
Yes there will be no redundancy in "my" variant, guaranteed. And yes,
there is probability to complete the whole "your" process without a glitch.
> i'll admit it depends a heck of a lot on how long you wait to replace your
> disks, but i prefer to replace mine well before they get to the point
> where just reading the entire disk is guaranteed to result in problems.
>
>> And if the drive you're replacing is failing (according to SMART
>> for example), this method is more likely to fail.
>
> my practice is to run regular SMART long self tests, which tend to find
> Current_Pending_Sectors (which are generally read errors waiting to
> happen) and then launch a "repair" sync action... that generally drops the
> Current_Pending_Sector back to zero. either through a realloc or just
> simply rewriting the block. if it's a realloc then i consider if there's
> enough of them to warrant replacing the disk...
>
> so for me the chances of a read error while doing the raid1 thing aren't
> as high as they could be...
So the whole thing goes this way:
0) do a SMART selftest ;)
1) do repair for the whole array
2) copy data from failing to new drive
(using temporary superblock-less array)
2a) if step 2 failed still, probably due to new bad sectors,
go the "old way", removing the failing drive and adding
new one.
That's 2x or 3x (or 4x counting the selftest, but that should be
done regardless) more work than just going the "old way" from the
beginning, but still some chances to have it completed flawlessly
in 2 steps, without losing redundancy.
Too complicated and too long for most people I'd say ;)
I can come to yet another way, which is only somewhat possible with
current md code. In 3 variants.
1) Offline the array, stop it.
Make a copy of the drive using dd with error=skip (or how it is),
noticing the bad blocks
Mark those bad blocks in bitmap as dirty
Assemble the array with new drive, letting it to resync the blocks
to new drive which we were unable to copy previously.
This variant does not lose redundancy at all, but requires the array to
be off-line during the whole copy procedure. What's missing (which has
been discussed on linux-raid@ recently too) is the ability to mark those
"bad" blocks in bitmap.
2) The same, but not offlining the array. Hot-remove a drive, make copy
of it to new drive, flip necessary bitmap bits, and re-add the new drive,
and let raid code to resync changed (during copy, while the array was
still active, something might has changed) and missing blocks.
This variant still loses redundancy, but not much of it, provided the bitmap
code works correctly.
3) The same as your way, with the difference that we tell md to *skip* and
ignore possible errors during resync (which is also not possible currently).
> but yeah you've convinced me this solution isn't good enough.
But all this, all 5 (so far ;) ways, aren't nice ;)
/mjt
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: proactive-raid-disk-replacement
2006-09-08 10:47 ` proactive-raid-disk-replacement Michael Tokarev
@ 2006-09-08 18:44 ` dean gaudet
0 siblings, 0 replies; 9+ messages in thread
From: dean gaudet @ 2006-09-08 18:44 UTC (permalink / raw)
To: Michael Tokarev; +Cc: Linux RAID
On Fri, 8 Sep 2006, Michael Tokarev wrote:
> dean gaudet wrote:
> > On Fri, 8 Sep 2006, Michael Tokarev wrote:
> >
> >> Recently Dean Gaudet, in thread titled 'Feature
> >> Request/Suggestion - "Drive Linking"', mentioned his
> >> document, http://arctic.org/~dean/proactive-raid5-disk-replacement.txt
> >>
> >> I've read it, and have some umm.. concerns. Here's why:
> >>
> >> ....
> >>> mdadm -Gb internal --bitmap-chunk=1024 /dev/md4
>
> By the way, don't specify bitmap-chunk for internal bitmap.
> It's needed for file-based (external) bitmap. With internal
> bitmap, we have fixed size in superblock for it, so bitmap-chunk
> is determined by dividing that size by size of the array.
yeah sorry that was with an older version of mdadm which didn't calculate
the chunksize correct for an internal bitmap on a large enough array... i
should have mentioned that in the post. it's fixed in newer mdadm.
> > my practice is to run regular SMART long self tests, which tend to find
> > Current_Pending_Sectors (which are generally read errors waiting to
> > happen) and then launch a "repair" sync action... that generally drops the
> > Current_Pending_Sector back to zero. either through a realloc or just
> > simply rewriting the block. if it's a realloc then i consider if there's
> > enough of them to warrant replacing the disk...
> >
> > so for me the chances of a read error while doing the raid1 thing aren't
> > as high as they could be...
>
> So the whole thing goes this way:
> 0) do a SMART selftest ;)
> 1) do repair for the whole array
> 2) copy data from failing to new drive
> (using temporary superblock-less array)
> 2a) if step 2 failed still, probably due to new bad sectors,
> go the "old way", removing the failing drive and adding
> new one.
>
> That's 2x or 3x (or 4x counting the selftest, but that should be
> done regardless) more work than just going the "old way" from the
> beginning, but still some chances to have it completed flawlessly
> in 2 steps, without losing redundancy.
well it's more "work" but i don't actually manually launch the SMART
tests, smartd does that. i just notice when i get mail indicating
Current_Pending_Sectors has gone up.
but i'm starting to lean towards SMART short tests (in case they test
something i can't test with a full surface read) and regular crontabbed
rate-limited repair or check actions.
> 2) The same, but not offlining the array. Hot-remove a drive, make copy
> of it to new drive, flip necessary bitmap bits, and re-add the new drive,
> and let raid code to resync changed (during copy, while the array was
> still active, something might has changed) and missing blocks.
>
> This variant still loses redundancy, but not much of it, provided the bitmap
> code works correctly.
i like this method. it yields the minimal disk copy time because there's
no competition with the live traffic... and you can recover if another
disk has errors while you're doing the copy.
> 3) The same as your way, with the difference that we tell md to *skip* and
> ignore possible errors during resync (which is also not possible currently).
maybe we could hand it a bitmap to record the errors in... so we could
merge it with the raid5 bitmap later.
still not really the best solution though, is it?
we really want a solution similar to raid10...
-dean
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: proactive-raid-disk-replacement
2006-09-08 8:48 proactive-raid-disk-replacement Michael Tokarev
2006-09-08 9:24 ` proactive-raid-disk-replacement dean gaudet
@ 2006-09-10 0:02 ` Bodo Thiesen
2006-09-10 10:53 ` proactive-raid-disk-replacement Tuomas Leikola
1 sibling, 1 reply; 9+ messages in thread
From: Bodo Thiesen @ 2006-09-10 0:02 UTC (permalink / raw)
To: Linux RAID
Michael Tokarev <mjt@tls.msk.ru> wrote:
> > mdadm -Gb internal --bitmap-chunk=1024 /dev/md4
> > mdadm /dev/md4 -r /dev/sdh1
> > mdadm /dev/md4 -f /dev/sde1 -r /dev/sde1
> > mdadm --build /dev/md5 -ayes --level=1 --raid-devices=2 /dev/sde1 missing
> > mdadm /dev/md4 --re-add /dev/md5
> > mdadm /dev/md5 -a /dev/sdh1
> >
> > ... wait a few hours for md5 resync...
>
> And here's the problem. While new disk, sdh1, are resynced from
> old, probably failing disk sde1, chances are high that there will
> be an unreadable block on sde1.
So, we need a way, to feedback the redundancy from the raid5 to the raid1.
Here is a short 5 minute brainstorm I did, to check, wether it's possible to
manage this, and I think, it is:
Requirements:
Any Raid with parity of any kind needs to provide so called "vitual
block devices", which carry the same data, as the underlaying block
devices, which the array is composed of. If the underlaying block
device can't read a block, that block will be calculated from the
other raid disks and hence is still readable using the virtual block
device.
e.g. having the disks sda1 .. sde1 in a raid5 means, the raid
provides not one new block device (e.g. /dev/md4 as in the example
above), but six (the one just mentioned and maybe we call them
/dev/vsda1 .. /dev/vsde1 or /dev/mapper/vsda1 .. /dev/mapper/vsde1
or even /dev/mapper/virtual/sda1 .. /dev/mapper/virtual/sde1). For
ease, I'll call them just vsdx1 here.
Reading any block from vsda1 will yield the same data as reading
from sda1 at any time (except the case, that reading from sda1
fails, then vsda1 will still carry that data).
Now, construct the following nested raid structure:
sda1 + vsda1 + missing = /dev/md10 RAID1 w/o super block
sdb1 + vsdb1 + missing = /dev/md11 RAID1 w/o super block
sdc1 + vsdc1 + missing = /dev/md12 RAID1 w/o super block
sdd1 + vsdd1 + missing = /dev/md13 RAID1 w/o super block
sde1 + vsde1 + missing = /dev/md14 RAID1 w/o super block
md10 + md11 + md12 + md13 + md14 = /dev/md4 RAID5 optionally with sb
Problem:
As long as md4 is not active, vsdx1 is not available. So the arrays
md1x need to be created with 1 disk out of 3. After md4 was
assembled, vsdx1 needs to be added. Now we get another problem:
There must be no sync between sdx1 and vsdx1 (they are more or less
the same device). So there should be an option to mdadm like
--assume-sync for hot-add.
What we get:
As soon as we decide to replace a disk (like sde1 as above) we just
hot-add sdh1 to the sde1-raid1 array. That array will start
resyncing. If now a block can't be read from sde1, it's just taken
from vsde1 (and there that block will be reconstructed from the
raid5).
After syncing to sdh1 was completed, sde1 may be removed from the
array.
We would loose redundancy at no time - the only lost redundancy is those of
the already failed sde1 which we can't workaround anyways (except for using
raid6 etc.).
This is only a brainstorm, and I don't know what internal effects could
cause problems, like the resyncing process of the raid1 array reading a bad
block from sde1 then triggering a reconstruction using vsde1 if in parallel
the raid5 itself detects (e.g. as cause from a user space read) sde1 to have
failed and tries to write back that block to the raid array for sde1 while
in the raid1 the same rewrite is pending already ... problems over problems,
but the evil is in detail as ever ...
Regards, Bodo
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: proactive-raid-disk-replacement
2006-09-10 0:02 ` proactive-raid-disk-replacement Bodo Thiesen
@ 2006-09-10 10:53 ` Tuomas Leikola
2006-09-16 14:28 ` proactive-raid-disk-replacement Bill Davidsen
0 siblings, 1 reply; 9+ messages in thread
From: Tuomas Leikola @ 2006-09-10 10:53 UTC (permalink / raw)
To: Bodo Thiesen; +Cc: Linux RAID
On 9/10/06, Bodo Thiesen <bothie@gmx.de> wrote:
> So, we need a way, to feedback the redundancy from the raid5 to the raid1.
<snip long explanation>
Sounds awfully complicated to me. Perhaps this is how it internally
works, but my 2 cents go to the option to gracefully remove a device
(migrating to a spare without losing redundancy) in the kernel (or
mdadm).
I'm thinking
mdadm /dev/raid-device -a /dev/new-disk
mdadm /dev/raid-device --graceful-remove /dev/failing-disk
also hopefully a path to do this instead of kicking (multiple) disks
when bad blocks occur.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: proactive-raid-disk-replacement
2006-09-10 10:53 ` proactive-raid-disk-replacement Tuomas Leikola
@ 2006-09-16 14:28 ` Bill Davidsen
2006-09-16 17:47 ` proactive-raid-disk-replacement Guy
0 siblings, 1 reply; 9+ messages in thread
From: Bill Davidsen @ 2006-09-16 14:28 UTC (permalink / raw)
To: Tuomas Leikola; +Cc: Bodo Thiesen, Linux RAID
Tuomas Leikola wrote:
> On 9/10/06, Bodo Thiesen <bothie@gmx.de> wrote:
>
>> So, we need a way, to feedback the redundancy from the raid5 to the
>> raid1.
>
> <snip long explanation>
>
> Sounds awfully complicated to me. Perhaps this is how it internally
> works, but my 2 cents go to the option to gracefully remove a device
> (migrating to a spare without losing redundancy) in the kernel (or
> mdadm).
>
> I'm thinking
>
> mdadm /dev/raid-device -a /dev/new-disk
> mdadm /dev/raid-device --graceful-remove /dev/failing-disk
>
> also hopefully a path to do this instead of kicking (multiple) disks
> when bad blocks occur.
Actually, an internal implementation is really needed if this is to be
generally useful to a non-guru. And it has other possible uses, as well.
if there were just a --migrate command:
mdadm --migrate /dev/md0 /dev/sda /dev/sdf
as an example for discussion, the whole process of not only moving the
data, but getting recovered information from the RAID array could be
done by software which does the right thing, creating superblocks, copy
UUID, etc. And as a last step it could invalidate the superblock on the
failing drive (so reboots would work right) and leave the array running
on the new drive.
But wait, there's more! Assume that I want to upgrade from a set of
250GB drives to 400GB drives. Using this feature I could replace a drive
at a time, then --grow the array. The process for doing that is complex
currently, and many manual steps invite errors.
--
bill davidsen <davidsen@tmr.com>
CTO TMR Associates, Inc
Doing interesting things with small computers since 1979
^ permalink raw reply [flat|nested] 9+ messages in thread
* RE: proactive-raid-disk-replacement
2006-09-16 14:28 ` proactive-raid-disk-replacement Bill Davidsen
@ 2006-09-16 17:47 ` Guy
2006-09-21 16:28 ` proactive-raid-disk-replacement Bill Davidsen
0 siblings, 1 reply; 9+ messages in thread
From: Guy @ 2006-09-16 17:47 UTC (permalink / raw)
To: 'Bill Davidsen', 'Tuomas Leikola'
Cc: 'Bodo Thiesen', 'Linux RAID'
} -----Original Message-----
} From: linux-raid-owner@vger.kernel.org [mailto:linux-raid-
} owner@vger.kernel.org] On Behalf Of Bill Davidsen
} Sent: Saturday, September 16, 2006 10:29 AM
} To: Tuomas Leikola
} Cc: Bodo Thiesen; Linux RAID
} Subject: Re: proactive-raid-disk-replacement
}
} Tuomas Leikola wrote:
}
} > On 9/10/06, Bodo Thiesen <bothie@gmx.de> wrote:
} >
} >> So, we need a way, to feedback the redundancy from the raid5 to the
} >> raid1.
} >
} > <snip long explanation>
} >
} > Sounds awfully complicated to me. Perhaps this is how it internally
} > works, but my 2 cents go to the option to gracefully remove a device
} > (migrating to a spare without losing redundancy) in the kernel (or
} > mdadm).
} >
} > I'm thinking
} >
} > mdadm /dev/raid-device -a /dev/new-disk
} > mdadm /dev/raid-device --graceful-remove /dev/failing-disk
} >
} > also hopefully a path to do this instead of kicking (multiple) disks
} > when bad blocks occur.
}
}
} Actually, an internal implementation is really needed if this is to be
} generally useful to a non-guru. And it has other possible uses, as well.
} if there were just a --migrate command:
} mdadm --migrate /dev/md0 /dev/sda /dev/sdf
} as an example for discussion, the whole process of not only moving the
} data, but getting recovered information from the RAID array could be
} done by software which does the right thing, creating superblocks, copy
} UUID, etc. And as a last step it could invalidate the superblock on the
} failing drive (so reboots would work right) and leave the array running
} on the new drive.
}
} But wait, there's more! Assume that I want to upgrade from a set of
} 250GB drives to 400GB drives. Using this feature I could replace a drive
} at a time, then --grow the array. The process for doing that is complex
} currently, and many manual steps invite errors.
I like the migrate option or whatever you want to call it. However, if the
disk is failing I would want to avoid reading from the failing disk and
reconstruct the data from the other disks. Only reading from the failing
disk if you find a bad block on another disk. This would put less strain on
the failing disk possibly allowing it to last long enough to finish the
migrate. Your data would remain redundant throughout the process.
However if I am upgrading to larger disks it would be best to read from the
disk being replaced. You could even migrate many disks at the same time.
Your data would remain redundant throughout the process.
Guy
}
} --
} bill davidsen <davidsen@tmr.com>
} CTO TMR Associates, Inc
} Doing interesting things with small computers since 1979
}
} -
} To unsubscribe from this list: send the line "unsubscribe linux-raid" in
} the body of a message to majordomo@vger.kernel.org
} More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: proactive-raid-disk-replacement
2006-09-16 17:47 ` proactive-raid-disk-replacement Guy
@ 2006-09-21 16:28 ` Bill Davidsen
0 siblings, 0 replies; 9+ messages in thread
From: Bill Davidsen @ 2006-09-21 16:28 UTC (permalink / raw)
To: Guy; +Cc: 'Tuomas Leikola', 'Bodo Thiesen',
'Linux RAID'
Guy wrote:
>} -----Original Message-----
>} From: linux-raid-owner@vger.kernel.org [mailto:linux-raid-
>} owner@vger.kernel.org] On Behalf Of Bill Davidsen
>} Sent: Saturday, September 16, 2006 10:29 AM
>} To: Tuomas Leikola
>} Cc: Bodo Thiesen; Linux RAID
>} Subject: Re: proactive-raid-disk-replacement
>}
>} Tuomas Leikola wrote:
>}
>} > On 9/10/06, Bodo Thiesen <bothie@gmx.de> wrote:
>} >
>} >> So, we need a way, to feedback the redundancy from the raid5 to the
>} >> raid1.
>} >
>} > <snip long explanation>
>} >
>} > Sounds awfully complicated to me. Perhaps this is how it internally
>} > works, but my 2 cents go to the option to gracefully remove a device
>} > (migrating to a spare without losing redundancy) in the kernel (or
>} > mdadm).
>} >
>} > I'm thinking
>} >
>} > mdadm /dev/raid-device -a /dev/new-disk
>} > mdadm /dev/raid-device --graceful-remove /dev/failing-disk
>} >
>} > also hopefully a path to do this instead of kicking (multiple) disks
>} > when bad blocks occur.
>}
>}
>} Actually, an internal implementation is really needed if this is to be
>} generally useful to a non-guru. And it has other possible uses, as well.
>} if there were just a --migrate command:
>} mdadm --migrate /dev/md0 /dev/sda /dev/sdf
>} as an example for discussion, the whole process of not only moving the
>} data, but getting recovered information from the RAID array could be
>} done by software which does the right thing, creating superblocks, copy
>} UUID, etc. And as a last step it could invalidate the superblock on the
>} failing drive (so reboots would work right) and leave the array running
>} on the new drive.
>}
>} But wait, there's more! Assume that I want to upgrade from a set of
>} 250GB drives to 400GB drives. Using this feature I could replace a drive
>} at a time, then --grow the array. The process for doing that is complex
>} currently, and many manual steps invite errors.
>
>I like the migrate option or whatever you want to call it. However, if the
>disk is failing I would want to avoid reading from the failing disk and
>reconstruct the data from the other disks. Only reading from the failing
>disk if you find a bad block on another disk. This would put less strain on
>the failing disk possibly allowing it to last long enough to finish the
>migrate. Your data would remain redundant throughout the process.
>
>
This is one of those "maybe" things, the data move would take longer,
increasing the chance of a total fail... etc. Someone else might speak
to this, I generally find that the non-total failures usually result in
writing bad sectors but reading not causing problems. Note _usually_. In
either case, during the migrate you wouldn't want to write to the
failing drive, so it would gradually fall out of currency in any case.
>However if I am upgrading to larger disks it would be best to read from the
>disk being replaced. You could even migrate many disks at the same time.
>Your data would remain redundant throughout the process.
>
Many at a time supposes room for more new drives, and makes the code
seem a lot more complex. I would hope this doesn't happen often enough
to make efficiency important.
--
bill davidsen <davidsen@tmr.com>
CTO TMR Associates, Inc
Doing interesting things with small computers since 1979
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2006-09-21 16:28 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-09-08 8:48 proactive-raid-disk-replacement Michael Tokarev
2006-09-08 9:24 ` proactive-raid-disk-replacement dean gaudet
2006-09-08 10:47 ` proactive-raid-disk-replacement Michael Tokarev
2006-09-08 18:44 ` proactive-raid-disk-replacement dean gaudet
2006-09-10 0:02 ` proactive-raid-disk-replacement Bodo Thiesen
2006-09-10 10:53 ` proactive-raid-disk-replacement Tuomas Leikola
2006-09-16 14:28 ` proactive-raid-disk-replacement Bill Davidsen
2006-09-16 17:47 ` proactive-raid-disk-replacement Guy
2006-09-21 16:28 ` proactive-raid-disk-replacement Bill Davidsen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).