[PATCH] md/raid1: always set MD_RECOVERY_INTR flag in raid1 error handler to avoid potential data corruption

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] md/raid1: always set MD_RECOVERY_INTR flag in raid1 error handler to avoid potential data corruption
@ 2014-07-28  8:09 jiao hui
  2014-07-28  8:23 ` jiao hui
  2014-07-29  2:44 ` NeilBrown
  0 siblings, 2 replies; 5+ messages in thread
From: jiao hui @ 2014-07-28  8:09 UTC (permalink / raw)
  To: linux-raid, NeilBrown; +Cc: guomingyang, zhaomeng

From 1fdbfb8552c00af55d11d7a63cdafbdf1749ff63 Mon Sep 17 00:00:00 2001
From: Jiao Hui <simonjiaoh@gmail.com>
Date: Mon, 28 Jul 2014 11:57:20 +0800
Subject: [PATCH] md/raid1: always set MD_RECOVERY_INTR flag in raid1 error handler to avoid potential data corruption

    In the recovery of raid1 with bitmap, if a bitmap bit has a NEEDED or RESYNC flag,
    actual resync io will happen. The sync_thread check each rdev, if any rdev is missing
    or has a FAULTY flag, the array is still_degraded, then the bitmap bit NEEDED flag
    not cleared. Otherwise, we cleared NEEDED flag and set RESYNC flag. The RESYNC flag cleared
    in bitmap_cond_end_sync or bitmap_close_sync.

    If the only disk which is being recovered fails again when raid1 recovery is in progress.
    The resync_thread can't find a non-In_sync disk to write, then the remaining recovery skipped.
    RAID1 error handler only set MD_RECOVERY_INTR flag when a In_sync disk fails. But the disk
    being reocvered is non-In_sync, then md_do_sync can't got the INTR singal to break, and the
    mddev->curr_resync is uptodated to max_sectors (mddev->dev_sectors). When raid1 personality
    tries to finish resync process, no bitmap bit with RESYNC flag can set back to NEEDED flag,
    and bitmap_close_sync clear the RESYNC flag. When the disk is added back, the area from
    the offset of last recovery to the end of bitmap-chunk is skipped by resync_thread forever.
    
    Signed-off-by: JiaoHui <jiaohui@bwstor.com.cn>

---
 drivers/md/raid1.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index aacf6bf..51d06eb 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -1391,16 +1391,16 @@ static void error(struct mddev *mddev, struct md_rdev *rdev)
 		return;
 	}
 	set_bit(Blocked, &rdev->flags);
+	/*
+	 * if recovery is running, make sure it aborts.
+	 */
+	set_bit(MD_RECOVERY_INTR, &mddev->recovery);
 	if (test_and_clear_bit(In_sync, &rdev->flags)) {
 		unsigned long flags;
 		spin_lock_irqsave(&conf->device_lock, flags);
 		mddev->degraded++;
 		set_bit(Faulty, &rdev->flags);
 		spin_unlock_irqrestore(&conf->device_lock, flags);
-		/*
-		 * if recovery is running, make sure it aborts.
-		 */
-		set_bit(MD_RECOVERY_INTR, &mddev->recovery);
 	} else
 		set_bit(Faulty, &rdev->flags);
 	set_bit(MD_CHANGE_DEVS, &mddev->flags);
-- 
1.8.3.1




^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] md/raid1: always set MD_RECOVERY_INTR flag in raid1 error handler to avoid potential data corruption
  2014-07-28  8:09 [PATCH] md/raid1: always set MD_RECOVERY_INTR flag in raid1 error handler to avoid potential data corruption jiao hui
@ 2014-07-28  8:23 ` jiao hui
  2014-07-29  2:44 ` NeilBrown
  1 sibling, 0 replies; 5+ messages in thread
From: jiao hui @ 2014-07-28  8:23 UTC (permalink / raw)
  To: jiao hui; +Cc: linux-raid, NeilBrown, guomingyang, zhaomeng

I reproduce this issue almost each time with a raid1 with considerable
large bitmap-chunk， such as 64MB。
I make this patch on Centos7.0。
Any comments are welcome.

On Mon, Jul 28, 2014 at 4:09 PM, jiao hui <jiaohui@bwstor.com.cn> wrote:
> From 1fdbfb8552c00af55d11d7a63cdafbdf1749ff63 Mon Sep 17 00:00:00 2001
> From: Jiao Hui <simonjiaoh@gmail.com>
> Date: Mon, 28 Jul 2014 11:57:20 +0800
> Subject: [PATCH] md/raid1: always set MD_RECOVERY_INTR flag in raid1 error handler to avoid potential data corruption
>
>     In the recovery of raid1 with bitmap, if a bitmap bit has a NEEDED or RESYNC flag,
>     actual resync io will happen. The sync_thread check each rdev, if any rdev is missing
>     or has a FAULTY flag, the array is still_degraded, then the bitmap bit NEEDED flag
>     not cleared. Otherwise, we cleared NEEDED flag and set RESYNC flag. The RESYNC flag cleared
>     in bitmap_cond_end_sync or bitmap_close_sync.
>
>     If the only disk which is being recovered fails again when raid1 recovery is in progress.
>     The resync_thread can't find a non-In_sync disk to write, then the remaining recovery skipped.
>     RAID1 error handler only set MD_RECOVERY_INTR flag when a In_sync disk fails. But the disk
>     being reocvered is non-In_sync, then md_do_sync can't got the INTR singal to break, and the
>     mddev->curr_resync is uptodated to max_sectors (mddev->dev_sectors). When raid1 personality
>     tries to finish resync process, no bitmap bit with RESYNC flag can set back to NEEDED flag,
>     and bitmap_close_sync clear the RESYNC flag. When the disk is added back, the area from
>     the offset of last recovery to the end of bitmap-chunk is skipped by resync_thread forever.
>
>     Signed-off-by: JiaoHui <jiaohui@bwstor.com.cn>
>
> ---
>  drivers/md/raid1.c | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
> index aacf6bf..51d06eb 100644
> --- a/drivers/md/raid1.c
> +++ b/drivers/md/raid1.c
> @@ -1391,16 +1391,16 @@ static void error(struct mddev *mddev, struct md_rdev *rdev)
>                 return;
>         }
>         set_bit(Blocked, &rdev->flags);
> +       /*
> +        * if recovery is running, make sure it aborts.
> +        */
> +       set_bit(MD_RECOVERY_INTR, &mddev->recovery);
>         if (test_and_clear_bit(In_sync, &rdev->flags)) {
>                 unsigned long flags;
>                 spin_lock_irqsave(&conf->device_lock, flags);
>                 mddev->degraded++;
>                 set_bit(Faulty, &rdev->flags);
>                 spin_unlock_irqrestore(&conf->device_lock, flags);
> -               /*
> -                * if recovery is running, make sure it aborts.
> -                */
> -               set_bit(MD_RECOVERY_INTR, &mddev->recovery);
>         } else
>                 set_bit(Faulty, &rdev->flags);
>         set_bit(MD_CHANGE_DEVS, &mddev->flags);
> --
> 1.8.3.1
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] md/raid1: always set MD_RECOVERY_INTR flag in raid1 error handler to avoid potential data corruption
  2014-07-28  8:09 [PATCH] md/raid1: always set MD_RECOVERY_INTR flag in raid1 error handler to avoid potential data corruption jiao hui
  2014-07-28  8:23 ` jiao hui
@ 2014-07-29  2:44 ` NeilBrown
  2014-07-29  6:50   ` jiao hui
  1 sibling, 1 reply; 5+ messages in thread
From: NeilBrown @ 2014-07-29  2:44 UTC (permalink / raw)
  To: jiao hui; +Cc: linux-raid, guomingyang, zhaomeng

[-- Attachment #1: Type: text/plain, Size: 3965 bytes --]

On Mon, 28 Jul 2014 16:09:33 +0800 jiao hui <jiaohui@bwstor.com.cn> wrote:

> >From 1fdbfb8552c00af55d11d7a63cdafbdf1749ff63 Mon Sep 17 00:00:00 2001
> From: Jiao Hui <simonjiaoh@gmail.com>
> Date: Mon, 28 Jul 2014 11:57:20 +0800
> Subject: [PATCH] md/raid1: always set MD_RECOVERY_INTR flag in raid1 error handler to avoid potential data corruption
> 
>     In the recovery of raid1 with bitmap, if a bitmap bit has a NEEDED or RESYNC flag,
>     actual resync io will happen. The sync_thread check each rdev, if any rdev is missing
>     or has a FAULTY flag, the array is still_degraded, then the bitmap bit NEEDED flag
>     not cleared. Otherwise, we cleared NEEDED flag and set RESYNC flag. The RESYNC flag cleared
>     in bitmap_cond_end_sync or bitmap_close_sync.
> 
>     If the only disk which is being recovered fails again when raid1 recovery is in progress.
>     The resync_thread can't find a non-In_sync disk to write, then the remaining recovery skipped.
>     RAID1 error handler only set MD_RECOVERY_INTR flag when a In_sync disk fails. But the disk
>     being reocvered is non-In_sync, then md_do_sync can't got the INTR singal to break, and the
>     mddev->curr_resync is uptodated to max_sectors (mddev->dev_sectors). When raid1 personality
>     tries to finish resync process, no bitmap bit with RESYNC flag can set back to NEEDED flag,
>     and bitmap_close_sync clear the RESYNC flag. When the disk is added back, the area from
>     the offset of last recovery to the end of bitmap-chunk is skipped by resync_thread forever.
>     
>     Signed-off-by: JiaoHui <jiaohui@bwstor.com.cn>
> 
> ---
>  drivers/md/raid1.c | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
> index aacf6bf..51d06eb 100644
> --- a/drivers/md/raid1.c
> +++ b/drivers/md/raid1.c
> @@ -1391,16 +1391,16 @@ static void error(struct mddev *mddev, struct md_rdev *rdev)
>  		return;
>  	}
>  	set_bit(Blocked, &rdev->flags);
> +	/*
> +	 * if recovery is running, make sure it aborts.
> +	 */
> +	set_bit(MD_RECOVERY_INTR, &mddev->recovery);
>  	if (test_and_clear_bit(In_sync, &rdev->flags)) {
>  		unsigned long flags;
>  		spin_lock_irqsave(&conf->device_lock, flags);
>  		mddev->degraded++;
>  		set_bit(Faulty, &rdev->flags);
>  		spin_unlock_irqrestore(&conf->device_lock, flags);
> -		/*
> -		 * if recovery is running, make sure it aborts.
> -		 */
> -		set_bit(MD_RECOVERY_INTR, &mddev->recovery);
>  	} else
>  		set_bit(Faulty, &rdev->flags);
>  	set_bit(MD_CHANGE_DEVS, &mddev->flags);


Hi,
 thanks for the report and the patch.

If the recovery process gets a write error it will abort the current bitmap
region by calling bitmap_end_sync() in end_sync_write().
However you are talking about a different situation where a normal IO write
gets and error and fails a drive.  Then the recovery aborts without aborting
the current bitmap region.

I think I would rather fix the bug by calling end_sync_write() at the place
where the recovery decides to abort, as in the following patch.
Would you be able to test it please and confirm that it works?

A similar fix will probably be needed for raid10.

Thanks,
NeilBrown

diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index 56e24c072b62..4f007a410f4b 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -2668,9 +2668,11 @@ static sector_t sync_request(struct mddev *mddev, sector_t sector_nr, int *skipp
 
 	if (write_targets == 0 || read_targets == 0) {
 		/* There is nowhere to write, so all non-sync
-		 * drives must be failed - so we are finished
+		 * drives must be failed - so we are finished.
+		 * But abort the current bitmap region though.
 		 */
 		sector_t rv;
+		bitmap_end_sync(mddev->bitmap, sector_nr, &sync_blocks, 1);
 		if (min_bad > 0)
 			max_sector = sector_nr + min_bad;
 		rv = max_sector - sector_nr;

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] md/raid1: always set MD_RECOVERY_INTR flag in raid1 error handler to avoid potential data corruption
  2014-07-29  2:44 ` NeilBrown
@ 2014-07-29  6:50   ` jiao hui
  2014-07-30  3:39     ` NeilBrown
  0 siblings, 1 reply; 5+ messages in thread
From: jiao hui @ 2014-07-29  6:50 UTC (permalink / raw)
  To: NeilBrown; +Cc: jiao hui, linux-raid, guomingyang, zhaomeng

Hi neil,

The patch works. I test it on Centos 7.0 for fifty rounds， no
consistency issue found。

Best Regards.
jiaohui

On Tue, Jul 29, 2014 at 10:44 AM, NeilBrown <neilb@suse.de> wrote:
> On Mon, 28 Jul 2014 16:09:33 +0800 jiao hui <jiaohui@bwstor.com.cn> wrote:
>
>> >From 1fdbfb8552c00af55d11d7a63cdafbdf1749ff63 Mon Sep 17 00:00:00 2001
>> From: Jiao Hui <simonjiaoh@gmail.com>
>> Date: Mon, 28 Jul 2014 11:57:20 +0800
>> Subject: [PATCH] md/raid1: always set MD_RECOVERY_INTR flag in raid1 error handler to avoid potential data corruption
>>
>>     In the recovery of raid1 with bitmap, if a bitmap bit has a NEEDED or RESYNC flag,
>>     actual resync io will happen. The sync_thread check each rdev, if any rdev is missing
>>     or has a FAULTY flag, the array is still_degraded, then the bitmap bit NEEDED flag
>>     not cleared. Otherwise, we cleared NEEDED flag and set RESYNC flag. The RESYNC flag cleared
>>     in bitmap_cond_end_sync or bitmap_close_sync.
>>
>>     If the only disk which is being recovered fails again when raid1 recovery is in progress.
>>     The resync_thread can't find a non-In_sync disk to write, then the remaining recovery skipped.
>>     RAID1 error handler only set MD_RECOVERY_INTR flag when a In_sync disk fails. But the disk
>>     being reocvered is non-In_sync, then md_do_sync can't got the INTR singal to break, and the
>>     mddev->curr_resync is uptodated to max_sectors (mddev->dev_sectors). When raid1 personality
>>     tries to finish resync process, no bitmap bit with RESYNC flag can set back to NEEDED flag,
>>     and bitmap_close_sync clear the RESYNC flag. When the disk is added back, the area from
>>     the offset of last recovery to the end of bitmap-chunk is skipped by resync_thread forever.
>>
>>     Signed-off-by: JiaoHui <jiaohui@bwstor.com.cn>
>>
>> ---
>>  drivers/md/raid1.c | 8 ++++----
>>  1 file changed, 4 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
>> index aacf6bf..51d06eb 100644
>> --- a/drivers/md/raid1.c
>> +++ b/drivers/md/raid1.c
>> @@ -1391,16 +1391,16 @@ static void error(struct mddev *mddev, struct md_rdev *rdev)
>>               return;
>>       }
>>       set_bit(Blocked, &rdev->flags);
>> +     /*
>> +      * if recovery is running, make sure it aborts.
>> +      */
>> +     set_bit(MD_RECOVERY_INTR, &mddev->recovery);
>>       if (test_and_clear_bit(In_sync, &rdev->flags)) {
>>               unsigned long flags;
>>               spin_lock_irqsave(&conf->device_lock, flags);
>>               mddev->degraded++;
>>               set_bit(Faulty, &rdev->flags);
>>               spin_unlock_irqrestore(&conf->device_lock, flags);
>> -             /*
>> -              * if recovery is running, make sure it aborts.
>> -              */
>> -             set_bit(MD_RECOVERY_INTR, &mddev->recovery);
>>       } else
>>               set_bit(Faulty, &rdev->flags);
>>       set_bit(MD_CHANGE_DEVS, &mddev->flags);
>
>
> Hi,
>  thanks for the report and the patch.
>
> If the recovery process gets a write error it will abort the current bitmap
> region by calling bitmap_end_sync() in end_sync_write().
> However you are talking about a different situation where a normal IO write
> gets and error and fails a drive.  Then the recovery aborts without aborting
> the current bitmap region.
>
> I think I would rather fix the bug by calling end_sync_write() at the place
> where the recovery decides to abort, as in the following patch.
> Would you be able to test it please and confirm that it works?
>
> A similar fix will probably be needed for raid10.
>
> Thanks,
> NeilBrown
>
> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
> index 56e24c072b62..4f007a410f4b 100644
> --- a/drivers/md/raid1.c
> +++ b/drivers/md/raid1.c
> @@ -2668,9 +2668,11 @@ static sector_t sync_request(struct mddev *mddev, sector_t sector_nr, int *skipp
>
>         if (write_targets == 0 || read_targets == 0) {
>                 /* There is nowhere to write, so all non-sync
> -                * drives must be failed - so we are finished
> +                * drives must be failed - so we are finished.
> +                * But abort the current bitmap region though.
>                  */
>                 sector_t rv;
> +               bitmap_end_sync(mddev->bitmap, sector_nr, &sync_blocks, 1);
>                 if (min_bad > 0)
>                         max_sector = sector_nr + min_bad;
>                 rv = max_sector - sector_nr;
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] md/raid1: always set MD_RECOVERY_INTR flag in raid1 error handler to avoid potential data corruption
  2014-07-29  6:50   ` jiao hui
@ 2014-07-30  3:39     ` NeilBrown
  0 siblings, 0 replies; 5+ messages in thread
From: NeilBrown @ 2014-07-30  3:39 UTC (permalink / raw)
  To: jiao hui; +Cc: linux-raid, guomingyang, zhaomeng

[-- Attachment #1: Type: text/plain, Size: 3419 bytes --]

On Tue, 29 Jul 2014 14:50:04 +0800 jiao hui <jiaohui@bwstor.com.cn> wrote:

> Hi neil,
> 
> The patch works. I test it on Centos 7.0 for fifty rounds， no
> consistency issue found。
> 
> Best Regards.
> jiaohui
> 

Thanks for testing.

I looked again and compared with raid10 and decided that your fix actually
was better.  If you are recovering two drives at once and only one fails, you
patch will do the right thing, but mine won't.

I'll be submitting the following.

Thanks,
NeilBrown

From b628438e59827e710df20c27fea680cbe1870272 Mon Sep 17 00:00:00 2001
From: NeilBrown <neilb@suse.de>
Date: Wed, 30 Jul 2014 13:24:50 +1000
Subject: [PATCH] md/raid1,raid10: always abort recover on write error.

Currently we don't abort recovery on a write error if the write error
to the recovering device was triggerd by normal IO (as opposed to
recovery IO).

This means that for one bitmap region, the recovery might write to the
recovering device for a few sectors, then not bother for subsequent
sectors (as it never writes to failed devices).  In this case
the bitmap bit will be cleared, but it really shouldn't.

The result is that if the recovering device fails and is then re-added
(after fixing whatever hardware problem triggerred the failure),
the second recovery won't redo the region it was in the middle of,
so some of the device will not be recovered properly.

If we abort the recovery, the region being processes will be cancelled
(bit not cleared) and the whole region will be retried.

As the bug can result in data corruption the patch is suitable for
-stable.  For kernels prior to 3.11 there is a conflict in raid10.c
which will require care.

Original-from: jiao hui <jiaohui@bwstor.com.cn>
Reported-and-tested-by: jiao hui <jiaohui@bwstor.com.cn>
Signed-off-by: NeilBrown <neilb@suse.de>
Cc: stable@vger.kernel.org

diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index 56e24c072b62..d7690f86fdb9 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -1501,12 +1501,12 @@ static void error(struct mddev *mddev, struct md_rdev *rdev)
 		mddev->degraded++;
 		set_bit(Faulty, &rdev->flags);
 		spin_unlock_irqrestore(&conf->device_lock, flags);
-		/*
-		 * if recovery is running, make sure it aborts.
-		 */
-		set_bit(MD_RECOVERY_INTR, &mddev->recovery);
 	} else
 		set_bit(Faulty, &rdev->flags);
+	/*
+	 * if recovery is running, make sure it aborts.
+	 */
+	set_bit(MD_RECOVERY_INTR, &mddev->recovery);
 	set_bit(MD_CHANGE_DEVS, &mddev->flags);
 	printk(KERN_ALERT
 	       "md/raid1:%s: Disk failure on %s, disabling device.\n"
diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index cb882aae9e20..b08c18871323 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -1684,13 +1684,12 @@ static void error(struct mddev *mddev, struct md_rdev *rdev)
 		spin_unlock_irqrestore(&conf->device_lock, flags);
 		return;
 	}
-	if (test_and_clear_bit(In_sync, &rdev->flags)) {
+	if (test_and_clear_bit(In_sync, &rdev->flags))
 		mddev->degraded++;
-			/*
-		 * if recovery is running, make sure it aborts.
-		 */
-		set_bit(MD_RECOVERY_INTR, &mddev->recovery);
-	}
+	/*
+	 * If recovery is running, make sure it aborts.
+	 */
+	set_bit(MD_RECOVERY_INTR, &mddev->recovery);
 	set_bit(Blocked, &rdev->flags);
 	set_bit(Faulty, &rdev->flags);
 	set_bit(MD_CHANGE_DEVS, &mddev->flags);

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply related	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2014-07-30  3:39 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-07-28  8:09 [PATCH] md/raid1: always set MD_RECOVERY_INTR flag in raid1 error handler to avoid potential data corruption jiao hui
2014-07-28  8:23 ` jiao hui
2014-07-29  2:44 ` NeilBrown
2014-07-29  6:50   ` jiao hui
2014-07-30  3:39     ` NeilBrown

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).