From: NeilBrown <neilb@suse.de>
To: Jonathan Brassow <jbrassow@redhat.com>
Cc: linux-raid@vger.kernel.org, agk@redhat.com
Subject: Re: [PATCH - v2] DM RAID: Add ability to restore transiently failed devices on resume
Date: Mon, 6 May 2013 16:00:35 +1000 [thread overview]
Message-ID: <20130506160035.2b84bda5@notabene.brown> (raw)
In-Reply-To: <1367522364.23442.1.camel@f16>
[-- Attachment #1: Type: text/plain, Size: 5580 bytes --]
On Thu, 02 May 2013 14:19:24 -0500 Jonathan Brassow <jbrassow@redhat.com>
wrote:
> DM RAID: Add ability to restore transiently failed devices on resume
>
> This patch adds code to the resume function to check over the devices
> in the RAID array. If any are found to be marked as failed and their
> superblocks can be read, an attempt is made to reintegrate them into
> the array. This allows the user to refresh the array with a simple
> suspend and resume of the array - rather than having to load a
> completely new table, allocate and initialize all the structures and
> throw away the old instantiation.
>
> Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
>
> Index: linux-upstream/drivers/md/dm-raid.c
> ===================================================================
> --- linux-upstream.orig/drivers/md/dm-raid.c
> +++ linux-upstream/drivers/md/dm-raid.c
> @@ -1574,12 +1574,54 @@ static void raid_postsuspend(struct dm_t
>
> static void raid_resume(struct dm_target *ti)
> {
> + int i;
> + uint64_t failed_devices, cleared_failed_devices = 0;
> + unsigned long flags;
> + struct dm_raid_superblock *sb;
> struct raid_set *rs = ti->private;
> + struct md_rdev *r;
>
> set_bit(MD_CHANGE_DEVS, &rs->md.flags);
> if (!rs->bitmap_loaded) {
> bitmap_load(&rs->md);
> rs->bitmap_loaded = 1;
> + } else {
> + /*
> + * A secondary resume while the device is active.
> + * Take this opportunity to check whether any failed
> + * devices are reachable again.
> + */
> + for (i = 0; i < rs->md.raid_disks; i++) {
> + r = &rs->dev[i].rdev;
> + if (test_bit(Faulty, &r->flags) && r->sb_page &&
> + sync_page_io(r, 0, r->sb_size,
> + r->sb_page, READ, 1)) {
> + DMINFO("Faulty device #%d has readable super"
> + "block. Attempting to revive it.", i);
> + r->raid_disk = i;
> + r->saved_raid_disk = i;
> + flags = r->flags;
> + clear_bit(Faulty, &r->flags);
> + clear_bit(WriteErrorSeen, &r->flags);
> + clear_bit(In_sync, &r->flags);
> + if (r->mddev->pers->hot_add_disk(r->mddev, r)) {
> + r->raid_disk = -1;
> + r->saved_raid_disk = -1;
> + r->flags = flags;
> + } else {
> + r->recovery_offset = 0;
> + cleared_failed_devices |= 1 << i;
> + }
> + }
> + }
> + if (cleared_failed_devices) {
> + rdev_for_each(r, &rs->md) {
> + sb = page_address(r->sb_page);
> + failed_devices = le64_to_cpu(sb->failed_devices);
> + failed_devices &= ~cleared_failed_devices;
> + sb->failed_devices = cpu_to_le64(failed_devices);
> + }
> + }
> }
>
> clear_bit(MD_RECOVERY_FROZEN, &rs->md.recovery);
> @@ -1588,7 +1630,7 @@ static void raid_resume(struct dm_target
>
> static struct target_type raid_target = {
> .name = "raid",
> - .version = {1, 5, 0},
> + .version = {1, 5, 1},
> .module = THIS_MODULE,
> .ctr = raid_ctr,
> .dtr = raid_dtr,
> Index: linux-upstream/drivers/md/raid1.c
> ===================================================================
> --- linux-upstream.orig/drivers/md/raid1.c
> +++ linux-upstream/drivers/md/raid1.c
> @@ -1518,8 +1518,9 @@ static int raid1_add_disk(struct mddev *
> p = conf->mirrors+mirror;
> if (!p->rdev) {
>
> - disk_stack_limits(mddev->gendisk, rdev->bdev,
> - rdev->data_offset << 9);
> + if (mddev->gendisk)
> + disk_stack_limits(mddev->gendisk, rdev->bdev,
> + rdev->data_offset << 9);
>
> p->head_position = 0;
> rdev->raid_disk = mirror;
> @@ -1558,7 +1559,7 @@ static int raid1_add_disk(struct mddev *
> clear_bit(Unmerged, &rdev->flags);
> }
> md_integrity_add_rdev(rdev, mddev);
> - if (blk_queue_discard(bdev_get_queue(rdev->bdev)))
> + if (mddev->queue && blk_queue_discard(bdev_get_queue(rdev->bdev)))
> queue_flag_set_unlocked(QUEUE_FLAG_DISCARD, mddev->queue);
> print_conf(conf);
> return err;
> Index: linux-upstream/drivers/md/raid10.c
> ===================================================================
> --- linux-upstream.orig/drivers/md/raid10.c
> +++ linux-upstream/drivers/md/raid10.c
> @@ -1806,15 +1806,17 @@ static int raid10_add_disk(struct mddev
> set_bit(Replacement, &rdev->flags);
> rdev->raid_disk = mirror;
> err = 0;
> - disk_stack_limits(mddev->gendisk, rdev->bdev,
> - rdev->data_offset << 9);
> + if (mddev->gendisk)
> + disk_stack_limits(mddev->gendisk, rdev->bdev,
> + rdev->data_offset << 9);
> conf->fullsync = 1;
> rcu_assign_pointer(p->replacement, rdev);
> break;
> }
>
> - disk_stack_limits(mddev->gendisk, rdev->bdev,
> - rdev->data_offset << 9);
> + if (mddev->gendisk)
> + disk_stack_limits(mddev->gendisk, rdev->bdev,
> + rdev->data_offset << 9);
>
> p->head_position = 0;
> p->recovery_disabled = mddev->recovery_disabled - 1;
> Index: linux-upstream/Documentation/device-mapper/dm-raid.txt
> ===================================================================
> --- linux-upstream.orig/Documentation/device-mapper/dm-raid.txt
> +++ linux-upstream/Documentation/device-mapper/dm-raid.txt
> @@ -222,3 +222,4 @@ Version History
> 1.4.2 Add RAID10 "far" and "offset" algorithm support.
> 1.5.0 Add message interface to allow manipulation of the sync_action.
> New status (STATUSTYPE_INFO) fields: sync_action and mismatch_cnt.
> +1.5.1 Add ability to restore transiently failed devices on resume.
>
Applied thanks. I assume this is heading for 3.11 ?
NeilBrown
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]
next prev parent reply other threads:[~2013-05-06 6:00 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-04-11 20:27 [PATCH] DM RAID: Add ability to restore transiently failed devices on resume Jonathan Brassow
2013-04-22 0:43 ` NeilBrown
2013-04-22 18:57 ` Brassow Jonathan
2013-04-24 6:39 ` NeilBrown
2013-05-02 19:19 ` [PATCH - v2] " Jonathan Brassow
2013-05-06 6:00 ` NeilBrown [this message]
2013-05-06 14:55 ` Brassow Jonathan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130506160035.2b84bda5@notabene.brown \
--to=neilb@suse.de \
--cc=agk@redhat.com \
--cc=jbrassow@redhat.com \
--cc=linux-raid@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).