Re: Re: md/raid5:Fix recover/replace stop if handle stipe failed

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: "majianpeng" <majianpeng@gmail.com>
To: NeilBrown <neilb@suse.de>
Cc: linux-raid <linux-raid@vger.kernel.org>
Subject: Re: Re: md/raid5:Fix recover/replace stop if handle stipe failed
Date: Wed, 14 Mar 2012 17:27:44 +0800	[thread overview]
Message-ID: <201203141727412189832@gmail.com> (raw)
In-Reply-To: 201203141507458909278@gmail.com

I created a raid5 using three disks and disk0 add bad blocks.I set faulty disk2 and remov disk2 and readd disk2.
It seems to recover well and set disk2 badblocks  as disk0.
But the md0_resync repeatly stop and start.
The recovery_start of disk2 all the same .

In function md_do_sync()
sectors = mddev->pers->sync_request(mddev, j, &skipped,
						  currspeed < speed_min(mddev));
		if (sectors == 0) {
			set_bit(MD_RECOVERY_INTR, &mddev->recovery);
			goto out;
		}

		if (!skipped) { /* actual IO requested */
			io_sectors += sectors;
			atomic_add(sectors, &mddev->recovery_active);
		}

		if (test_bit(MD_RECOVERY_INTR, &mddev->recovery))
			break;

		j += sectors;
		if (j>1) mddev->curr_resync = j;

If  'if (test_bit(MD_RECOVERY_INTR, &mddev->recovery))' is ok ,then j does not add sectors so curr_resync does not change.
The sparedisk recovery_start not change.


------------------				 
majianpeng
2012-03-14

-------------------------------------------------------------
发件人：NeilBrown
发送日期：2012-03-14 15:33:53
收件人：majianpeng
抄送：linux-raid
主题：Re: md/raid5:Fix recover/replace stop if handle stipe failed

On Wed, 14 Mar 2012 15:07:55 +0800 "majianpeng" <majianpeng@gmail.com> wrote:

> >From 849df9f6422972452b99a2c2d08d005437a52d72 Mon Sep 17 00:00:00 2001
> From: majianpeng <majianpeng@gmail.com>
> Date: Wed, 14 Mar 2012 14:41:07 +0800
> Subject: [PATCH] md/raid5:Fix recover/replace stop if handle stipe failed. 
>  If handled stipe failed when recover/replace,should not first
>  call md_done_sync(conf->mddev, STRIPE_SECTORS, 0).Beacause
>  this set MD_RECOVERY_INTR and will terminate the
>  recover/replace. And the sync_thread will repeatly start
>  and stop.

I disagree.  It is safer to stop and then (if all seems to be working) to
start again.  We will start up exactly were we left of so there is little
cost, and I think it make the code safer.


> 
> 
> Signed-off-by: majianpeng <majianpeng@gmail.com>
> ---
>  drivers/md/raid5.c |    8 ++++++--
>  1 files changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
> index 360f2b9..55193ef 100644
> --- a/drivers/md/raid5.c
> +++ b/drivers/md/raid5.c
> @@ -2472,7 +2472,6 @@ handle_failed_sync(struct r5conf *conf, struct stripe_head *sh,
>  	int abort = 0;
>  	int i;
>  
> -	md_done_sync(conf->mddev, STRIPE_SECTORS, 0);
>  	clear_bit(STRIPE_SYNCING, &sh->state);
>  	s->syncing = 0;
>  	s->replacing = 0;
> @@ -2480,8 +2479,12 @@ handle_failed_sync(struct r5conf *conf, struct stripe_head *sh,
>  	 * For recover/replace we need to record a bad block on all
>  	 * non-sync devices, or abort the recovery
>  	 */
> -	if (!test_bit(MD_RECOVERY_RECOVER, &conf->mddev->recovery))
> +	if (!test_bit(MD_RECOVERY_RECOVER, &conf->mddev->recovery)) {
> +		md_done_sync(conf->mddev, STRIPE_SECTORS, 0);
>  		return;
> +	} else
> +		md_done_sync(conf->mddev, STRIPE_SECTORS, 1);
> +
>  	/* During recovery devices cannot be removed, so locking and
>  	 * refcounting of rdevs is not needed
>  	 */
> @@ -2504,6 +2507,7 @@ handle_failed_sync(struct r5conf *conf, struct stripe_head *sh,
>  	if (abort) {
>  		conf->recovery_disabled = conf->mddev->recovery_disabled;
>  		set_bit(MD_RECOVERY_INTR, &conf->mddev->recovery);
> +		md_wakeup_thread(conf->mddev->thread);

This change seems unrelated to the above changes.

It isn't needed as this function is called only by the thread that you are
waking up, so it cannot be asleep.

Thanks,
NeilBrown



>  	}
>  }
>

next prev parent reply	other threads:[~2012-03-14  9:27 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-03-14  7:07 md/raid5:Fix recover/replace stop if handle stipe failed majianpeng
2012-03-14  7:33 ` NeilBrown
2012-03-14  9:27 ` majianpeng [this message]
2012-03-27  3:26   ` NeilBrown
2012-03-27  7:33     ` majianpeng
2012-03-28  1:40       ` NeilBrown
2012-03-28  1:45       ` majianpeng
2012-03-28  2:14         ` NeilBrown
2012-03-28  2:25           ` majianpeng

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=201203141727412189832@gmail.com \
    --to=majianpeng@gmail.com \
    --cc=linux-raid@vger.kernel.org \
    --cc=neilb@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).