Re: Swapping a disk without degrading an array

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Neil Brown <neilb@suse.de>
To: "Michał Sawicz" <michal@sawicz.net>
Cc: linux-raid <linux-raid@vger.kernel.org>
Subject: Re: Swapping a disk without degrading an array
Date: Fri, 29 Jan 2010 22:19:04 +1100	[thread overview]
Message-ID: <20100129221904.439e2afe@notabene> (raw)
In-Reply-To: <1264421475.30742.49.camel@test.apertos.eu>

On Mon, 25 Jan 2010 13:11:15 +0100
Michał Sawicz <michal@sawicz.net> wrote:

> Hi list,
> 
> This is something I've discussed on IRC and we achieved a conclusion
> that this might be useful, but somewhat limited use-case count might not
> warrant the effort to be implemented.
> 
> What I have in mind is allowing a member of an array to be paired with a
> spare while the array is on-line. The spare disk would then be filled
> with exactly the same data and would, in the end, replace the active
> member. The replaced disk could then be hot-removed without the array
> ever going into degraded mode.
> 
> I wanted to start a discussion whether this at all makes sense, what can
> be the use cases etc.
> 

As has been noted, this is a really good idea.  It just doesn't seem to get
priority.  Volunteers ???

So time to start:  with a little design work.

1/ The start of the array *must* be recorded in the metadata.  It we try to
   create a transparent whole-device copy then we could get confused later.
   So let's (For now) decide not to support 0.90 metadata, and support this
   in 1.x metadata with:
     - a new feature_flag saying that live spares are present
     - the high bit set in dev_roles[] means that this device is a live spare
       and is only in_sync up to 'recovery_offset'

2/ in sysfs we currently identify devices with a symlink
     md/rd$N -> dev-$X
   for live-spare devices, this would be
     md/ls$N -> dev-$X

3/ We create a live spare by writing 'live-spare' to md/dev-$X/state
   and an appropriate value to md/dev-$X/recovery_start before setting
   md/dev-$X/slot

4/ When a device is failed, if there was a live spare is instantly takes
   the place of the failed device.

5/ This needs to be implemented separately in raid10 and raid456.
   raid1 doesn't really need live spares  but I wouldn't be totally against
   implementing them if it seemed helpful.

6/ There is no dynamic read balancing between a device and its live-spare.
   If the live spare is in-sync up to the end of the read, we read from the
   live-spare, else from the main device.

7/ writes transparently go to both the device and the live-spare, whether they
   are normal data writes or resync writes or whatever.

8/ In raid5.h struct r5dev needs a second 'struct bio' and a second
   'struct bio_vec'.
   'struct disk_info' needs a second mdk_rdev_t.

9/ in raid10.h mirror_info needs another mdk_rdev_t and the anon struct in 
   r10bio_s needs another 'struct bio *'.

10/ Both struct r5dev and r10bio_s need some counter or flag so we can know
    when both writes have completed.

11/ For both r5 and r10, the 'recover' process need to be enhanced to just
    read from the main device when a live-spare is being built.
    Obviously if this fail there needs to be a fall-back to read from
    elsewhere.

Probably lots more details, but that might be enough to get me (or someone)
started one day.

There would be lots of work to do in mdadm too of course to report on these
extensions and to assemble arrays with live-spares..

NeilBrown
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

next prev parent reply	other threads:[~2010-01-29 11:19 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-01-25 12:11 Swapping a disk without degrading an array Michał Sawicz
2010-01-25 12:25 ` Majed B.
2010-01-25 12:53   ` Mikael Abrahamsson
2010-01-25 14:44 ` Michał Sawicz
2010-01-25 14:51 ` Asdo
2010-01-25 17:40 ` Goswin von Brederlow
2010-01-29 11:19 ` Neil Brown [this message]
2010-01-29 15:35   ` Goswin von Brederlow
2010-01-31 15:34     ` Asdo
2010-01-31 16:33       ` Gabor Gombas
2010-01-31 17:32         ` Goswin von Brederlow

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100129221904.439e2afe@notabene \
    --to=neilb@suse.de \
    --cc=linux-raid@vger.kernel.org \
    --cc=michal@sawicz.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).