* Swapping a disk without degrading an array
@ 2010-01-25 12:11 Michał Sawicz
2010-01-25 12:25 ` Majed B.
` (4 more replies)
0 siblings, 5 replies; 11+ messages in thread
From: Michał Sawicz @ 2010-01-25 12:11 UTC (permalink / raw)
To: linux-raid
[-- Attachment #1: Type: text/plain, Size: 653 bytes --]
Hi list,
This is something I've discussed on IRC and we achieved a conclusion
that this might be useful, but somewhat limited use-case count might not
warrant the effort to be implemented.
What I have in mind is allowing a member of an array to be paired with a
spare while the array is on-line. The spare disk would then be filled
with exactly the same data and would, in the end, replace the active
member. The replaced disk could then be hot-removed without the array
ever going into degraded mode.
I wanted to start a discussion whether this at all makes sense, what can
be the use cases etc.
--
Cheers
Michał (Saviq) Sawicz
[-- Attachment #2: To jest część wiadomości podpisana cyfrowo --]
[-- Type: application/pgp-signature, Size: 198 bytes --]
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Swapping a disk without degrading an array
2010-01-25 12:11 Swapping a disk without degrading an array Michał Sawicz
@ 2010-01-25 12:25 ` Majed B.
2010-01-25 12:53 ` Mikael Abrahamsson
2010-01-25 14:44 ` Michał Sawicz
` (3 subsequent siblings)
4 siblings, 1 reply; 11+ messages in thread
From: Majed B. @ 2010-01-25 12:25 UTC (permalink / raw)
To: Michał Sawicz; +Cc: linux-raid
There's a technique called active spare, and is already available on
some hardware RAID controllers. It keeps the hot spare in sync with
the array, such that in an event of a disk failure, the spare kicks-in
immediately without wasting time for a resync.
I think what you're proposing is similar to the following scenario:
array0: (assume raid5): disk0, disk1, disk2, disk3(spare)
array1: (raid1): disk0, disk3
Though I'm not sure if it's feasible to nest raids or have a disk to
be a member of 2 arrays at the same time.
I think it was proposed before, but I donno about its priority.
2010/1/25 Michał Sawicz <michal@sawicz.net>:
> Hi list,
>
> This is something I've discussed on IRC and we achieved a conclusion
> that this might be useful, but somewhat limited use-case count might not
> warrant the effort to be implemented.
>
> What I have in mind is allowing a member of an array to be paired with a
> spare while the array is on-line. The spare disk would then be filled
> with exactly the same data and would, in the end, replace the active
> member. The replaced disk could then be hot-removed without the array
> ever going into degraded mode.
>
> I wanted to start a discussion whether this at all makes sense, what can
> be the use cases etc.
>
> --
> Cheers
> Michał (Saviq) Sawicz
>
--
Majed B.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Swapping a disk without degrading an array
2010-01-25 12:25 ` Majed B.
@ 2010-01-25 12:53 ` Mikael Abrahamsson
0 siblings, 0 replies; 11+ messages in thread
From: Mikael Abrahamsson @ 2010-01-25 12:53 UTC (permalink / raw)
To: Majed B.; +Cc: Michał Sawicz, linux-raid
On Mon, 25 Jan 2010, Majed B. wrote:
> Though I'm not sure if it's feasible to nest raids or have a disk to be
> a member of 2 arrays at the same time.
I think the proposal is for the scenario when a drive is being upgraded to
a larger sized drive.
So:
1 Add spare X
2 Tell mdadm to replace drive N with the new spare
3 Information on N is now copied to X online
4 When copy is done, N and X contains the same info, and N is now
converted to spare by mdadm
5 Hot-remove N
This means if I want to upgrade from size to size+X drives I can do that
without ever degrading the array.
--
Mikael Abrahamsson email: swmike@swm.pp.se
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Swapping a disk without degrading an array
2010-01-25 12:11 Swapping a disk without degrading an array Michał Sawicz
2010-01-25 12:25 ` Majed B.
@ 2010-01-25 14:44 ` Michał Sawicz
2010-01-25 14:51 ` Asdo
` (2 subsequent siblings)
4 siblings, 0 replies; 11+ messages in thread
From: Michał Sawicz @ 2010-01-25 14:44 UTC (permalink / raw)
To: linux-raid
[-- Attachment #1: Type: text/plain, Size: 367 bytes --]
Dnia 2010-01-25, pon o godzinie 13:11 +0100, Michał Sawicz pisze:
> I wanted to start a discussion whether this at all makes sense, what
> can
> be the use cases etc.
It seems it is on the mdadm To-Do list for a year now:
http://neil.brown.name/blog/20090129234603
This suggests my idea wasn't entirely stupid :)
--
Cheers
Michał (Saviq) Sawicz
[-- Attachment #2: To jest część wiadomości podpisana cyfrowo --]
[-- Type: application/pgp-signature, Size: 198 bytes --]
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Swapping a disk without degrading an array
2010-01-25 12:11 Swapping a disk without degrading an array Michał Sawicz
2010-01-25 12:25 ` Majed B.
2010-01-25 14:44 ` Michał Sawicz
@ 2010-01-25 14:51 ` Asdo
2010-01-25 17:40 ` Goswin von Brederlow
2010-01-29 11:19 ` Neil Brown
4 siblings, 0 replies; 11+ messages in thread
From: Asdo @ 2010-01-25 14:51 UTC (permalink / raw)
To: Michał Sawicz; +Cc: linux-raid
Michał Sawicz wrote:
> ... I wanted to start a discussion whether this at all makes sense, what can
> be the use cases etc. ...
This appears just a great feature to me, you get my vote.
I also was thinking about something similar. This is probably the most
desirable feature request for MD for me right now.
Use cases could be:
- 1 - the obvious one: you are seeing some preliminary errors
(correctable read errors, or SMART errors) on the disk and you want to
replace it without making the array degraded & temporarily vulnerable.
- 2 - recoverying from a really bad array having multiple read errors in
different places in multiple disks (replacing one disk at a time with
the feature you suggest): consider that while filling each sector of the
the hot-spare the algorithm has 2 places where to read data from:
firstly it can try read from the drive being replaced, and then if that
one returns read errors it can get the information from parity.
Currently there is no other way to do this with this level of redundancy
AFAIK, at least not automatically and not with the array online.
Consider that if you have a bad array as described, doing a full scrub
would take the array down, i.e. the scrub would never successfully
finish, and the new drive could never be filled with data. While with
the feature you suggest, there is no scrub on the whole array: data is
taken only from the drive being replaced for all the sectors (that's the
only disk being scrubbed), except possibly for a few sectors being
defective on that disk, for which parity is used.
Thank you
Asdo
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Swapping a disk without degrading an array
2010-01-25 12:11 Swapping a disk without degrading an array Michał Sawicz
` (2 preceding siblings ...)
2010-01-25 14:51 ` Asdo
@ 2010-01-25 17:40 ` Goswin von Brederlow
2010-01-29 11:19 ` Neil Brown
4 siblings, 0 replies; 11+ messages in thread
From: Goswin von Brederlow @ 2010-01-25 17:40 UTC (permalink / raw)
To: Micha Sawicz; +Cc: linux-raid
MichaÅ Sawicz <michal@sawicz.net> writes:
> Hi list,
>
> This is something I've discussed on IRC and we achieved a conclusion
> that this might be useful, but somewhat limited use-case count might not
> warrant the effort to be implemented.
>
> What I have in mind is allowing a member of an array to be paired with a
> spare while the array is on-line. The spare disk would then be filled
> with exactly the same data and would, in the end, replace the active
> member. The replaced disk could then be hot-removed without the array
> ever going into degraded mode.
>
> I wanted to start a discussion whether this at all makes sense, what can
> be the use cases etc.
I had that discussion last year with Neil. Summary: It totaly makes
sense, is not that hard to implement but doesn't have a high priority.
You sort of do it with 2 short downtimes. Shut down the raid, set up a
dm-mirror target, restart the raid, wait for the mirror to complete,
shutdown and undo the dm-mirror. Instead of dm-mirror you can also use a
superblock-less raid1. You get problems on a crash though unless the
superblock is mirrored last because then the wrong (incomplete) disk
might be added to the raid on boot.
Besides replacing a disk suspect of failing soon there is also a second
use case. Balancing the wear of the active and spare disks. If you buy 6
new disks and create a 5 disks + spare raid 5 then the spare will remain
unused while the remaining disks wear down. So every now and then it
would be nice to rotate the spare disk so the wear is distributed
better. This could be done in parallel with the monthly raid check,
where you read the full disk data and verify it anyway. Copying out one
disk to the spare at the same time and switching them at the end would
cost little extra (given enough controler bandwith it wouldn't even slow
the check down).
MfG
Goswin
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Swapping a disk without degrading an array
2010-01-25 12:11 Swapping a disk without degrading an array Michał Sawicz
` (3 preceding siblings ...)
2010-01-25 17:40 ` Goswin von Brederlow
@ 2010-01-29 11:19 ` Neil Brown
2010-01-29 15:35 ` Goswin von Brederlow
4 siblings, 1 reply; 11+ messages in thread
From: Neil Brown @ 2010-01-29 11:19 UTC (permalink / raw)
To: Michał Sawicz; +Cc: linux-raid
On Mon, 25 Jan 2010 13:11:15 +0100
Michał Sawicz <michal@sawicz.net> wrote:
> Hi list,
>
> This is something I've discussed on IRC and we achieved a conclusion
> that this might be useful, but somewhat limited use-case count might not
> warrant the effort to be implemented.
>
> What I have in mind is allowing a member of an array to be paired with a
> spare while the array is on-line. The spare disk would then be filled
> with exactly the same data and would, in the end, replace the active
> member. The replaced disk could then be hot-removed without the array
> ever going into degraded mode.
>
> I wanted to start a discussion whether this at all makes sense, what can
> be the use cases etc.
>
As has been noted, this is a really good idea. It just doesn't seem to get
priority. Volunteers ???
So time to start: with a little design work.
1/ The start of the array *must* be recorded in the metadata. It we try to
create a transparent whole-device copy then we could get confused later.
So let's (For now) decide not to support 0.90 metadata, and support this
in 1.x metadata with:
- a new feature_flag saying that live spares are present
- the high bit set in dev_roles[] means that this device is a live spare
and is only in_sync up to 'recovery_offset'
2/ in sysfs we currently identify devices with a symlink
md/rd$N -> dev-$X
for live-spare devices, this would be
md/ls$N -> dev-$X
3/ We create a live spare by writing 'live-spare' to md/dev-$X/state
and an appropriate value to md/dev-$X/recovery_start before setting
md/dev-$X/slot
4/ When a device is failed, if there was a live spare is instantly takes
the place of the failed device.
5/ This needs to be implemented separately in raid10 and raid456.
raid1 doesn't really need live spares but I wouldn't be totally against
implementing them if it seemed helpful.
6/ There is no dynamic read balancing between a device and its live-spare.
If the live spare is in-sync up to the end of the read, we read from the
live-spare, else from the main device.
7/ writes transparently go to both the device and the live-spare, whether they
are normal data writes or resync writes or whatever.
8/ In raid5.h struct r5dev needs a second 'struct bio' and a second
'struct bio_vec'.
'struct disk_info' needs a second mdk_rdev_t.
9/ in raid10.h mirror_info needs another mdk_rdev_t and the anon struct in
r10bio_s needs another 'struct bio *'.
10/ Both struct r5dev and r10bio_s need some counter or flag so we can know
when both writes have completed.
11/ For both r5 and r10, the 'recover' process need to be enhanced to just
read from the main device when a live-spare is being built.
Obviously if this fail there needs to be a fall-back to read from
elsewhere.
Probably lots more details, but that might be enough to get me (or someone)
started one day.
There would be lots of work to do in mdadm too of course to report on these
extensions and to assemble arrays with live-spares..
NeilBrown
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Swapping a disk without degrading an array
2010-01-29 11:19 ` Neil Brown
@ 2010-01-29 15:35 ` Goswin von Brederlow
2010-01-31 15:34 ` Asdo
0 siblings, 1 reply; 11+ messages in thread
From: Goswin von Brederlow @ 2010-01-29 15:35 UTC (permalink / raw)
To: linux-raid
Neil Brown <neilb@suse.de> writes:
> So time to start: with a little design work.
>
> 1/ The start of the array *must* be recorded in the metadata. It we try to
> create a transparent whole-device copy then we could get confused later.
> So let's (For now) decide not to support 0.90 metadata, and support this
> in 1.x metadata with:
> - a new feature_flag saying that live spares are present
> - the high bit set in dev_roles[] means that this device is a live spare
> and is only in_sync up to 'recovery_offset'
Could the bitmap be used here too?
> 2/ in sysfs we currently identify devices with a symlink
> md/rd$N -> dev-$X
> for live-spare devices, this would be
> md/ls$N -> dev-$X
>
> 3/ We create a live spare by writing 'live-spare' to md/dev-$X/state
> and an appropriate value to md/dev-$X/recovery_start before setting
> md/dev-$X/slot
>
> 4/ When a device is failed, if there was a live spare is instantly takes
> the place of the failed device.
Some cases:
1) the mirroring is still going and the error is in a in-sync region
I think setting the drive to write-mostly and keeping it is better than
kicking the drive and requireing a re-sync to get the live-spare active.
2) the mirroring is still going and the error is in a out-of-sync region
If the erorr is caused by the mirroring itself then the block can also
be restored from parity and then goto 1. But if it happens often fail
the drive anyway as the errors cost too much time. Otherwise, unless we
have bitmaps to first repair the region covered by the bit and then goto
1, there is not much we can do here. Fail the drive.
It would be good to note that the being mirrored disk had faults and
imediatly fail it when the mirroring is complete.
Also the "often" above should be configurable and include a "never"
option. Say you have 2 disks that are damaged at different locations. By
creating a live-spare with "never" the mirroring would eventualy succeed
and repair the raid while kicking a disk would cause data loss.
3) the mirroring is complete
No sense keeping the broken disk, fail it and use the live-spare
instead. Mdadm should probably have an option to automatically remove
the old disk once the mirroring is done for a live spare.
> 5/ This needs to be implemented separately in raid10 and raid456.
> raid1 doesn't really need live spares but I wouldn't be totally against
> implementing them if it seemed helpful.
Raid1 would only need the "create new mirror without failing existing
disks" mode. The disks in a raid1 might all be damages but in different
locations.
> 6/ There is no dynamic read balancing between a device and its live-spare.
> If the live spare is in-sync up to the end of the read, we read from the
> live-spare, else from the main device.
So the old drive is write-mostly. That makes (1) above irelevant.
> 7/ writes transparently go to both the device and the live-spare, whether they
> are normal data writes or resync writes or whatever.
>
> 8/ In raid5.h struct r5dev needs a second 'struct bio' and a second
> 'struct bio_vec'.
> 'struct disk_info' needs a second mdk_rdev_t.
>
> 9/ in raid10.h mirror_info needs another mdk_rdev_t and the anon struct in
> r10bio_s needs another 'struct bio *'.
>
> 10/ Both struct r5dev and r10bio_s need some counter or flag so we can know
> when both writes have completed.
>
> 11/ For both r5 and r10, the 'recover' process need to be enhanced to just
> read from the main device when a live-spare is being built.
> Obviously if this fail there needs to be a fall-back to read from
> elsewhere.
Shouldn't recover read from the live-spare where the live-spare already
is in-sync and the main drive otherwise?
> Probably lots more details, but that might be enough to get me (or someone)
> started one day.
>
> There would be lots of work to do in mdadm too of course to report on these
> extensions and to assemble arrays with live-spares..
>
> NeilBrown
MfG
Goswin
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Swapping a disk without degrading an array
2010-01-29 15:35 ` Goswin von Brederlow
@ 2010-01-31 15:34 ` Asdo
2010-01-31 16:33 ` Gabor Gombas
0 siblings, 1 reply; 11+ messages in thread
From: Asdo @ 2010-01-31 15:34 UTC (permalink / raw)
To: Goswin von Brederlow; +Cc: linux-raid, Neil Brown
Goswin von Brederlow wrote:
> Neil Brown <neilb@suse.de> writes:
>
>
>> So time to start: with a little design work.
>>
>> 1/ The start of the array *must* be recorded in the metadata. It we try to
>> create a transparent whole-device copy then we could get confused later.
>> So let's (For now) decide not to support 0.90 metadata, and support this
>> in 1.x metadata with:
>> - a new feature_flag saying that live spares are present
>> - the high bit set in dev_roles[] means that this device is a live spare
>> and is only in_sync up to 'recovery_offset'
>>
>
> Could the bitmap be used here too?
>
>
>> 2/ in sysfs we currently identify devices with a symlink
>> md/rd$N -> dev-$X
>> for live-spare devices, this would be
>> md/ls$N -> dev-$X
>>
>> 3/ We create a live spare by writing 'live-spare' to md/dev-$X/state
>> and an appropriate value to md/dev-$X/recovery_start before setting
>> md/dev-$X/slot
>>
>> 4/ When a device is failed, if there was a live spare is instantly takes
>> the place of the failed device.
>>
>
> Some cases:
>
> 1) the mirroring is still going and the error is in a in-sync region
>
> I think setting the drive to write-mostly and keeping it is better than
> kicking the drive and requireing a re-sync to get the live-spare active.
>
> 2) the mirroring is still going and the error is in a out-of-sync region
>
> If the erorr is caused by the mirroring itself then the block can also
> be restored from parity and then goto 1. But if it happens often fail
> the drive anyway as the errors cost too much time. Otherwise, unless we
> have bitmaps to first repair the region covered by the bit and then goto
> 1, there is not much we can do here. Fail the drive.
>
> It would be good to note that the being mirrored disk had faults and
> imediatly fail it when the mirroring is complete.
>
> Also the "often" above should be configurable and include a "never"
> option. Say you have 2 disks that are damaged at different locations. By
> creating a live-spare with "never" the mirroring would eventualy succeed
> and repair the raid while kicking a disk would cause data loss.
>
> 3) the mirroring is complete
>
> No sense keeping the broken disk, fail it and use the live-spare
> instead. Mdadm should probably have an option to automatically remove
> the old disk once the mirroring is done for a live spare.
>
>
>> 5/ This needs to be implemented separately in raid10 and raid456.
>> raid1 doesn't really need live spares but I wouldn't be totally against
>> implementing them if it seemed helpful.
>>
>
> Raid1 would only need the "create new mirror without failing existing
> disks" mode. The disks in a raid1 might all be damages but in different
> locations.
>
>
>> 6/ There is no dynamic read balancing between a device and its live-spare.
>> If the live spare is in-sync up to the end of the read, we read from the
>> live-spare, else from the main device.
>>
>
> So the old drive is write-mostly. That makes (1) above irelevant.
>
>
>> 7/ writes transparently go to both the device and the live-spare, whether they
>> are normal data writes or resync writes or whatever.
>>
>> 8/ In raid5.h struct r5dev needs a second 'struct bio' and a second
>> 'struct bio_vec'.
>> 'struct disk_info' needs a second mdk_rdev_t.
>>
>> 9/ in raid10.h mirror_info needs another mdk_rdev_t and the anon struct in
>> r10bio_s needs another 'struct bio *'.
>>
>> 10/ Both struct r5dev and r10bio_s need some counter or flag so we can know
>> when both writes have completed.
>>
>> 11/ For both r5 and r10, the 'recover' process need to be enhanced to just
>> read from the main device when a live-spare is being built.
>> Obviously if this fail there needs to be a fall-back to read from
>> elsewhere.
>>
>
> Shouldn't recover read from the live-spare where the live-spare already
> is in-sync and the main drive otherwise?
>
>
>> Probably lots more details, but that might be enough to get me (or someone)
>> started one day.
>>
>> There would be lots of work to do in mdadm too of course to report on these
>> extensions and to assemble arrays with live-spares..
>>
>> NeilBrown
>>
>
> MfG
> Goswin
>
The implementation you are proposing is great, very featureful.
However for a first implementation there is probably a simpler
alternative which can give most of the benefits and still leave you the
chance to add the rest of the features afterwards.
This would be my suggestion:
1/ The live-spare gets filled of data without recording anything on any
superblocks. If there is a power failure and reboot, the new MD will
know nothing about this. The process has to be restarted.
2/ When the live-spare is full of data, you switch the superblocks in a
quick (almost atomic) operation. You remove the old device from the
array and you add the new device in its place.
This doesn't support two copies of a drive running together, but I guess
most people would be using hot-device-replace simply as a replacement
for "fail" (also see my other post in thread "Re: Read errors on raid5
ignored, array still clean .. then disaster !!"). It already has a great
value for us for what I have read recently on the ML.
What I'd really suggest for the algorithm is: during read of the old
device for replication, don't fail and kick-out the old device if there
are read errors on a few sectors. Just read from parity and go on.
Unless the old drive is really disastered (like it doesn't respond to
anything, times out too many times, or was kicked by the controller),
try to fail the old device only at the end.
If parity read also fails, fail just the hot-device-replace operation
(and log something into dmesg), not the whole old device (failing the
whole old device would trigger replication and eventually bring down the
array). The rationale is that the hot-device-replace should be a safe
operation that the sysadmin can run without anxiety. If the sysadmin
knows that the operation can bring down the array, the purpose of this
feature would be partly missed imho.
E.g. in case of raid-6, the algorithm would be:
For each block:
read from disk being replaced and write the block into the hot-spare
If the read fails:
read from all other disks.
If you get at least N-2 no-error reads:
compute the block and write it into the hot-spare
else:
fail the hot-device-replace operation. I suggest to leave
the array up.
Log something into dmesg. mdadm can send an email. Also see
below (*)
The hot-device-replace feature makes a great addition especially if
coupled with the "threshold for max corrected read errors" feature. The
hot-device-replace should get triggered when the threshold for max
corrected read errors is surpassed. See motivation for it in my other
post in thread "Re: Read errors on raid5 ignored, array still clean ..
then disaster !!" .
(*) If "threshold for max corrected read errors" is surpassed by more
than 1, it means more than one hot-device-replace actions have failed
due to too many read errors on the same stripe. I suggest to still keep
the array up and do not fail disks, however I hope mdadm is set to send
emails... If the drive then shows an uncorrectable read error probably
there's no other choice than failing it, however in this case the array
will certainly go down.
Summing up I suggest to really "fail" the drive (remove from array) only
if "threshold for max corrected read errors" is surpassed AND "an
uncorrectable read error happens". When just one of the 2 things happen,
I suggest to just try triggering an hot-device-replace.
Thank you
Asdo
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Swapping a disk without degrading an array
2010-01-31 15:34 ` Asdo
@ 2010-01-31 16:33 ` Gabor Gombas
2010-01-31 17:32 ` Goswin von Brederlow
0 siblings, 1 reply; 11+ messages in thread
From: Gabor Gombas @ 2010-01-31 16:33 UTC (permalink / raw)
To: Asdo; +Cc: Goswin von Brederlow, linux-raid, Neil Brown
On Sun, Jan 31, 2010 at 04:34:03PM +0100, Asdo wrote:
> 1/ The live-spare gets filled of data without recording anything on
> any superblocks. If there is a power failure and reboot, the new MD
> will know nothing about this. The process has to be restarted.
IMHO MD must know about the copy and it must know not to use the new
device before the copying is completed. Otherwise after a reboot mdadm
may either import the new half-written spare instead of the real one if
the superblock is already copied, or other tools like LVM may start
using the new half-written spare instead of the RAID if the MD
superblock is still missing.
Gabor
--
---------------------------------------------------------
MTA SZTAKI Computer and Automation Research Institute
Hungarian Academy of Sciences
---------------------------------------------------------
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Swapping a disk without degrading an array
2010-01-31 16:33 ` Gabor Gombas
@ 2010-01-31 17:32 ` Goswin von Brederlow
0 siblings, 0 replies; 11+ messages in thread
From: Goswin von Brederlow @ 2010-01-31 17:32 UTC (permalink / raw)
To: Gabor Gombas; +Cc: Asdo, Goswin von Brederlow, linux-raid, Neil Brown
Gabor Gombas <gombasg@sztaki.hu> writes:
> On Sun, Jan 31, 2010 at 04:34:03PM +0100, Asdo wrote:
>
>> 1/ The live-spare gets filled of data without recording anything on
>> any superblocks. If there is a power failure and reboot, the new MD
>> will know nothing about this. The process has to be restarted.
>
> IMHO MD must know about the copy and it must know not to use the new
> device before the copying is completed. Otherwise after a reboot mdadm
> may either import the new half-written spare instead of the real one if
> the superblock is already copied, or other tools like LVM may start
> using the new half-written spare instead of the RAID if the MD
> superblock is still missing.
>
> Gabor
No that is exactly what he means to avoid.
His suggestion is that at the start the metadata area of the life-spare
is kept as is, being a simple unused spare. Only the in-memory data
records that it actualy is a live-spare and only the data part of the
device is mirrored.
Then at the end you remove the old disk, add the live-spare and record
the change in the metadata of all drives in an semi atomic way. If
anything interrupts the operation before this the live-spare will still
be recorgnised as normal spare when the raid is reassembled.
MfG
Goswin
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2010-01-31 17:32 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-01-25 12:11 Swapping a disk without degrading an array Michał Sawicz
2010-01-25 12:25 ` Majed B.
2010-01-25 12:53 ` Mikael Abrahamsson
2010-01-25 14:44 ` Michał Sawicz
2010-01-25 14:51 ` Asdo
2010-01-25 17:40 ` Goswin von Brederlow
2010-01-29 11:19 ` Neil Brown
2010-01-29 15:35 ` Goswin von Brederlow
2010-01-31 15:34 ` Asdo
2010-01-31 16:33 ` Gabor Gombas
2010-01-31 17:32 ` Goswin von Brederlow
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).