Re: [PATCH - v2] DM RAID: Add ability to restore transiently failed devices on resume

All of lore.kernel.org
 help / color / mirror / Atom feed

From: NeilBrown <neilb@suse.de>
To: Jonathan Brassow <jbrassow@redhat.com>
Cc: linux-raid@vger.kernel.org, agk@redhat.com
Subject: Re: [PATCH - v2] DM RAID: Add ability to restore transiently failed devices on resume
Date: Mon, 6 May 2013 16:00:35 +1000	[thread overview]
Message-ID: <20130506160035.2b84bda5@notabene.brown> (raw)
In-Reply-To: <1367522364.23442.1.camel@f16>

[-- Attachment #1: Type: text/plain, Size: 5580 bytes --]

On Thu, 02 May 2013 14:19:24 -0500 Jonathan Brassow <jbrassow@redhat.com>
wrote:

> DM RAID: Add ability to restore transiently failed devices on resume
> 
> This patch adds code to the resume function to check over the devices
> in the RAID array.  If any are found to be marked as failed and their
> superblocks can be read, an attempt is made to reintegrate them into
> the array.  This allows the user to refresh the array with a simple
> suspend and resume of the array - rather than having to load a
> completely new table, allocate and initialize all the structures and
> throw away the old instantiation.
> 
> Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
> 
> Index: linux-upstream/drivers/md/dm-raid.c
> ===================================================================
> --- linux-upstream.orig/drivers/md/dm-raid.c
> +++ linux-upstream/drivers/md/dm-raid.c
> @@ -1574,12 +1574,54 @@ static void raid_postsuspend(struct dm_t
>  
>  static void raid_resume(struct dm_target *ti)
>  {
> +	int i;
> +	uint64_t failed_devices, cleared_failed_devices = 0;
> +	unsigned long flags;
> +	struct dm_raid_superblock *sb;
>  	struct raid_set *rs = ti->private;
> +	struct md_rdev *r;
>  
>  	set_bit(MD_CHANGE_DEVS, &rs->md.flags);
>  	if (!rs->bitmap_loaded) {
>  		bitmap_load(&rs->md);
>  		rs->bitmap_loaded = 1;
> +	} else {
> +		/*
> +		 * A secondary resume while the device is active.
> +		 * Take this opportunity to check whether any failed
> +		 * devices are reachable again.
> +		 */
> +		for (i = 0; i < rs->md.raid_disks; i++) {
> +			r = &rs->dev[i].rdev;
> +			if (test_bit(Faulty, &r->flags) && r->sb_page &&
> +			    sync_page_io(r, 0, r->sb_size,
> +					 r->sb_page, READ, 1)) {
> +				DMINFO("Faulty device #%d has readable super"
> +				       "block.  Attempting to revive it.", i);
> +				r->raid_disk = i;
> +				r->saved_raid_disk = i;
> +				flags = r->flags;
> +				clear_bit(Faulty, &r->flags);
> +				clear_bit(WriteErrorSeen, &r->flags);
> +				clear_bit(In_sync, &r->flags);
> +				if (r->mddev->pers->hot_add_disk(r->mddev, r)) {
> +					r->raid_disk = -1;
> +					r->saved_raid_disk = -1;
> +					r->flags = flags;
> +				} else {
> +					r->recovery_offset = 0;
> +					cleared_failed_devices |= 1 << i;
> +				}
> +			}
> +		}
> +		if (cleared_failed_devices) {
> +			rdev_for_each(r, &rs->md) {
> +				sb = page_address(r->sb_page);
> +				failed_devices = le64_to_cpu(sb->failed_devices);
> +				failed_devices &= ~cleared_failed_devices;
> +				sb->failed_devices = cpu_to_le64(failed_devices);
> +			}
> +		}
>  	}
>  
>  	clear_bit(MD_RECOVERY_FROZEN, &rs->md.recovery);
> @@ -1588,7 +1630,7 @@ static void raid_resume(struct dm_target
>  
>  static struct target_type raid_target = {
>  	.name = "raid",
> -	.version = {1, 5, 0},
> +	.version = {1, 5, 1},
>  	.module = THIS_MODULE,
>  	.ctr = raid_ctr,
>  	.dtr = raid_dtr,
> Index: linux-upstream/drivers/md/raid1.c
> ===================================================================
> --- linux-upstream.orig/drivers/md/raid1.c
> +++ linux-upstream/drivers/md/raid1.c
> @@ -1518,8 +1518,9 @@ static int raid1_add_disk(struct mddev *
>  		p = conf->mirrors+mirror;
>  		if (!p->rdev) {
>  
> -			disk_stack_limits(mddev->gendisk, rdev->bdev,
> -					  rdev->data_offset << 9);
> +			if (mddev->gendisk)
> +				disk_stack_limits(mddev->gendisk, rdev->bdev,
> +						  rdev->data_offset << 9);
>  
>  			p->head_position = 0;
>  			rdev->raid_disk = mirror;
> @@ -1558,7 +1559,7 @@ static int raid1_add_disk(struct mddev *
>  		clear_bit(Unmerged, &rdev->flags);
>  	}
>  	md_integrity_add_rdev(rdev, mddev);
> -	if (blk_queue_discard(bdev_get_queue(rdev->bdev)))
> +	if (mddev->queue && blk_queue_discard(bdev_get_queue(rdev->bdev)))
>  		queue_flag_set_unlocked(QUEUE_FLAG_DISCARD, mddev->queue);
>  	print_conf(conf);
>  	return err;
> Index: linux-upstream/drivers/md/raid10.c
> ===================================================================
> --- linux-upstream.orig/drivers/md/raid10.c
> +++ linux-upstream/drivers/md/raid10.c
> @@ -1806,15 +1806,17 @@ static int raid10_add_disk(struct mddev
>  			set_bit(Replacement, &rdev->flags);
>  			rdev->raid_disk = mirror;
>  			err = 0;
> -			disk_stack_limits(mddev->gendisk, rdev->bdev,
> -					  rdev->data_offset << 9);
> +			if (mddev->gendisk)
> +				disk_stack_limits(mddev->gendisk, rdev->bdev,
> +						  rdev->data_offset << 9);
>  			conf->fullsync = 1;
>  			rcu_assign_pointer(p->replacement, rdev);
>  			break;
>  		}
>  
> -		disk_stack_limits(mddev->gendisk, rdev->bdev,
> -				  rdev->data_offset << 9);
> +		if (mddev->gendisk)
> +			disk_stack_limits(mddev->gendisk, rdev->bdev,
> +					  rdev->data_offset << 9);
>  
>  		p->head_position = 0;
>  		p->recovery_disabled = mddev->recovery_disabled - 1;
> Index: linux-upstream/Documentation/device-mapper/dm-raid.txt
> ===================================================================
> --- linux-upstream.orig/Documentation/device-mapper/dm-raid.txt
> +++ linux-upstream/Documentation/device-mapper/dm-raid.txt
> @@ -222,3 +222,4 @@ Version History
>  1.4.2   Add RAID10 "far" and "offset" algorithm support.
>  1.5.0   Add message interface to allow manipulation of the sync_action.
>  	New status (STATUSTYPE_INFO) fields: sync_action and mismatch_cnt.
> +1.5.1   Add ability to restore transiently failed devices on resume.
> 


Applied thanks.  I assume this is heading for 3.11 ?

NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

next prev parent reply	other threads:[~2013-05-06  6:00 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-04-11 20:27 [PATCH] DM RAID: Add ability to restore transiently failed devices on resume Jonathan Brassow
2013-04-22  0:43 ` NeilBrown
2013-04-22 18:57   ` Brassow Jonathan
2013-04-24  6:39     ` NeilBrown
2013-05-02 19:19 ` [PATCH - v2] " Jonathan Brassow
2013-05-06  6:00   ` NeilBrown [this message]
2013-05-06 14:55     ` Brassow Jonathan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130506160035.2b84bda5@notabene.brown \
    --to=neilb@suse.de \
    --cc=agk@redhat.com \
    --cc=jbrassow@redhat.com \
    --cc=linux-raid@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.