From mboxrd@z Thu Jan 1 00:00:00 1970 From: Neil Brown Subject: Re: Swapping a disk without degrading an array Date: Fri, 29 Jan 2010 22:19:04 +1100 Message-ID: <20100129221904.439e2afe@notabene> References: <1264421475.30742.49.camel@test.apertos.eu> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: QUOTED-PRINTABLE Return-path: In-Reply-To: <1264421475.30742.49.camel@test.apertos.eu> Sender: linux-raid-owner@vger.kernel.org To: =?UTF-8?B?TWljaGHFgg==?= Sawicz Cc: linux-raid List-Id: linux-raid.ids On Mon, 25 Jan 2010 13:11:15 +0100 Micha=C5=82 Sawicz wrote: > Hi list, >=20 > This is something I've discussed on IRC and we achieved a conclusion > that this might be useful, but somewhat limited use-case count might = not > warrant the effort to be implemented. >=20 > What I have in mind is allowing a member of an array to be paired wit= h a > spare while the array is on-line. The spare disk would then be filled > with exactly the same data and would, in the end, replace the active > member. The replaced disk could then be hot-removed without the array > ever going into degraded mode. >=20 > I wanted to start a discussion whether this at all makes sense, what = can > be the use cases etc. >=20 As has been noted, this is a really good idea. It just doesn't seem to= get priority. Volunteers ??? So time to start: with a little design work. 1/ The start of the array *must* be recorded in the metadata. It we tr= y to create a transparent whole-device copy then we could get confused la= ter. So let's (For now) decide not to support 0.90 metadata, and support = this in 1.x metadata with: - a new feature_flag saying that live spares are present - the high bit set in dev_roles[] means that this device is a live= spare and is only in_sync up to 'recovery_offset' 2/ in sysfs we currently identify devices with a symlink md/rd$N -> dev-$X for live-spare devices, this would be md/ls$N -> dev-$X 3/ We create a live spare by writing 'live-spare' to md/dev-$X/state and an appropriate value to md/dev-$X/recovery_start before setting md/dev-$X/slot 4/ When a device is failed, if there was a live spare is instantly take= s the place of the failed device. 5/ This needs to be implemented separately in raid10 and raid456. raid1 doesn't really need live spares but I wouldn't be totally aga= inst implementing them if it seemed helpful. 6/ There is no dynamic read balancing between a device and its live-spa= re. If the live spare is in-sync up to the end of the read, we read from= the live-spare, else from the main device. 7/ writes transparently go to both the device and the live-spare, wheth= er they are normal data writes or resync writes or whatever. 8/ In raid5.h struct r5dev needs a second 'struct bio' and a second 'struct bio_vec'. 'struct disk_info' needs a second mdk_rdev_t. 9/ in raid10.h mirror_info needs another mdk_rdev_t and the anon struct= in=20 r10bio_s needs another 'struct bio *'. 10/ Both struct r5dev and r10bio_s need some counter or flag so we can = know when both writes have completed. 11/ For both r5 and r10, the 'recover' process need to be enhanced to j= ust read from the main device when a live-spare is being built. Obviously if this fail there needs to be a fall-back to read from elsewhere. Probably lots more details, but that might be enough to get me (or some= one) started one day. There would be lots of work to do in mdadm too of course to report on t= hese extensions and to assemble arrays with live-spares.. NeilBrown -- To unsubscribe from this list: send the line "unsubscribe linux-raid" i= n the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html