RAID5 losing initial synchronization on restart when one disk is spare

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* RAID5 losing initial synchronization on restart when one disk is spare
@ 2008-06-04 10:13 Hubert Verstraete
  2008-06-10 11:57 ` Hubert Verstraete
  2008-06-11 23:45 ` Neil Brown
  0 siblings, 2 replies; 14+ messages in thread
From: Hubert Verstraete @ 2008-06-04 10:13 UTC (permalink / raw)
  To: linux-raid

Hello

According to mdadm's man page:
"When creating a RAID5 array, mdadm will automatically create a degraded
array with an extra spare drive. This is because building the spare
into a degraded array is in general faster than resyncing the parity on
a non-degraded, but not clean, array. This feature can be over-ridden
with the --force option."

Unfortunately, I'm seeing a kind of bug when I create a RAID5 array with 
an internal bitmap, then stop the array before the initial 
synchronization is done and restart the array.

1° When I create the array with an internal bitmap:
mdadm -C /dev/md_d1 -e 1.2 -l 5 -n 4 -b internal -R /dev/sd?
I see the last disk as a spare disk. After the restart of the array, all 
disks are seen active and the array is not continuing the aborted 
synchronization!
Note that I did not use the --assume-clean option.

2° When I create the array without a bitmap:
mdadm -C /dev/md_d1 -e 1.2 -l 5 -n 4 -R /dev/sd?
I see the last disk as a spare disk. After the restart of the array, the 
spare disk is still a spare disk and the array continues the 
synchronization where it had stopped.

In the case 1°, is this a bug or did I miss something?
Secondly, what could be the consequences of this non-performed 
synchronization ?

Kernel version: 2.6.26-rc4
mdadm version: 2.6.2

Thanks,
Hubert
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: RAID5 losing initial synchronization on restart when one disk is spare
  2008-06-04 10:13 RAID5 losing initial synchronization on restart when one disk is spare Hubert Verstraete
@ 2008-06-10 11:57 ` Hubert Verstraete
  2008-06-10 22:56   ` Dan Williams
  2008-06-11 14:44   ` Hubert Verstraete
  2008-06-11 23:45 ` Neil Brown
  1 sibling, 2 replies; 14+ messages in thread
From: Hubert Verstraete @ 2008-06-10 11:57 UTC (permalink / raw)
  To: linux-raid

Hubert Verstraete wrote:
> Hello
>
> According to mdadm's man page:
> "When creating a RAID5 array, mdadm will automatically create a degraded
> array with an extra spare drive. This is because building the spare
> into a degraded array is in general faster than resyncing the parity on
> a non-degraded, but not clean, array. This feature can be over-ridden
> with the --force option."
> 
> Unfortunately, I'm seeing a kind of bug when I create a RAID5 array with 
> an internal bitmap, then stop the array before the initial 
> synchronization is done and restart the array.
> 
> 1° When I create the array with an internal bitmap:
> mdadm -C /dev/md_d1 -e 1.2 -l 5 -n 4 -b internal -R /dev/sd?
> I see the last disk as a spare disk. After the restart of the array, all 
> disks are seen active and the array is not continuing the aborted 
> synchronization!
> Note that I did not use the --assume-clean option.
> 
> 2° When I create the array without a bitmap:
> mdadm -C /dev/md_d1 -e 1.2 -l 5 -n 4 -R /dev/sd?
> I see the last disk as a spare disk. After the restart of the array, the 
> spare disk is still a spare disk and the array continues the 
> synchronization where it had stopped.
> 
> In the case 1°, is this a bug or did I miss something?
> Secondly, what could be the consequences of this non-performed 
> synchronization ?
> 
> Kernel version: 2.6.26-rc4
> mdadm version: 2.6.2
> 
> Thanks,
> Hubert

For the record, the new stable kernel 2.6.25.6 has the same issue.
I thought maybe the patch "md: fix prexor vs sync_request race" could 
have fixed this, unfortunately not.

Regards,
Hubert
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: RAID5 losing initial synchronization on restart when one disk is spare
  2008-06-10 11:57 ` Hubert Verstraete
@ 2008-06-10 22:56   ` Dan Williams
  2008-06-11  9:27     ` Hubert Verstraete
  2008-06-11 23:40     ` Neil Brown
  2008-06-11 14:44   ` Hubert Verstraete
  1 sibling, 2 replies; 14+ messages in thread
From: Dan Williams @ 2008-06-10 22:56 UTC (permalink / raw)
  To: Hubert Verstraete; +Cc: linux-raid, Neil Brown

On Tue, Jun 10, 2008 at 4:57 AM, Hubert Verstraete <hubskml@free.fr> wrote:
> Hubert Verstraete wrote:
>>
>> Hello
>>
>> According to mdadm's man page:
>> "When creating a RAID5 array, mdadm will automatically create a degraded
>> array with an extra spare drive. This is because building the spare
>> into a degraded array is in general faster than resyncing the parity on
>> a non-degraded, but not clean, array. This feature can be over-ridden
>> with the --force option."
>>
>> Unfortunately, I'm seeing a kind of bug when I create a RAID5 array with
>> an internal bitmap, then stop the array before the initial synchronization
>> is done and restart the array.
>>
>> 1° When I create the array with an internal bitmap:
>> mdadm -C /dev/md_d1 -e 1.2 -l 5 -n 4 -b internal -R /dev/sd?
>> I see the last disk as a spare disk. After the restart of the array, all
>> disks are seen active and the array is not continuing the aborted
>> synchronization!
>> Note that I did not use the --assume-clean option.
>>
>> 2° When I create the array without a bitmap:
>> mdadm -C /dev/md_d1 -e 1.2 -l 5 -n 4 -R /dev/sd?
>> I see the last disk as a spare disk. After the restart of the array, the
>> spare disk is still a spare disk and the array continues the synchronization
>> where it had stopped.
>>
>> In the case 1°, is this a bug or did I miss something?
>> Secondly, what could be the consequences of this non-performed
>> synchronization ?
>>
>> Kernel version: 2.6.26-rc4
>> mdadm version: 2.6.2
>>
>> Thanks,
>> Hubert
>
> For the record, the new stable kernel 2.6.25.6 has the same issue.
> I thought maybe the patch "md: fix prexor vs sync_request race" could have
> fixed this, unfortunately not.
>

I am able to reproduce this here, and I notice that it does not happen
with v0.90 superblocks.  In the v0.90 case when the array is stopped
the last disk remains marked as spare.  The following hack seems to
achieve the same effect for v1 arrays, but I wonder if it is
correct... Neil?

diff --git a/drivers/md/md.c b/drivers/md/md.c
index e9380b5..c38425f 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -1234,6 +1234,7 @@ static int super_1_validate(mddev_t *mddev,
mdk_rdev_t *rdev)
 		role = le16_to_cpu(sb->dev_roles[rdev->desc_nr]);
 		switch(role) {
 		case 0xffff: /* spare */
+			set_bit(NeedRebuild, &rdev->flags);
 			break;
 		case 0xfffe: /* faulty */
 			set_bit(Faulty, &rdev->flags);
@@ -1321,7 +1322,8 @@ static void super_1_sync(mddev_t *mddev, mdk_rdev_t *rdev)
 			sb->dev_roles[i] = cpu_to_le16(0xfffe);
 		else if (test_bit(In_sync, &rdev2->flags))
 			sb->dev_roles[i] = cpu_to_le16(rdev2->raid_disk);
-		else if (rdev2->raid_disk >= 0 && rdev2->recovery_offset > 0)
+		else if (rdev2->raid_disk >= 0 && rdev2->recovery_offset > 0 &&
+			 !test_bit(NeedRebuild, &rdev2->flags))
 			sb->dev_roles[i] = cpu_to_le16(rdev2->raid_disk);
 		else
 			sb->dev_roles[i] = cpu_to_le16(0xffff);
diff --git a/include/linux/raid/md_k.h b/include/linux/raid/md_k.h
index 3dea9f5..79201d6 100644
--- a/include/linux/raid/md_k.h
+++ b/include/linux/raid/md_k.h
@@ -87,6 +87,10 @@ struct mdk_rdev_s
 #define Blocked		8		/* An error occured on an externally
 					 * managed array, don't allow writes
 					 * until it is cleared */
+#define NeedRebuild	9		/* device needs to go through a rebuild
+					 * cycle before its 'role' can be saved
+					 * to disk
+					 */
 	wait_queue_head_t blocked_wait;

 	int desc_nr;			/* descriptor index in the superblock */
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: RAID5 losing initial synchronization on restart when one disk is spare
  2008-06-10 22:56   ` Dan Williams
@ 2008-06-11  9:27     ` Hubert Verstraete
  2008-06-11 23:40     ` Neil Brown
  1 sibling, 0 replies; 14+ messages in thread
From: Hubert Verstraete @ 2008-06-11  9:27 UTC (permalink / raw)
  To: Dan Williams; +Cc: linux-raid, Neil Brown

Dan Williams wrote:
> On Tue, Jun 10, 2008 at 4:57 AM, Hubert Verstraete <hubskml@free.fr> wrote:
>> Hubert Verstraete wrote:
>>> Hello
>>>
>>> According to mdadm's man page:
>>> "When creating a RAID5 array, mdadm will automatically create a degraded
>>> array with an extra spare drive. This is because building the spare
>>> into a degraded array is in general faster than resyncing the parity on
>>> a non-degraded, but not clean, array. This feature can be over-ridden
>>> with the --force option."
>>>
>>> Unfortunately, I'm seeing a kind of bug when I create a RAID5 array with
>>> an internal bitmap, then stop the array before the initial synchronization
>>> is done and restart the array.
>>>
>>> 1° When I create the array with an internal bitmap:
>>> mdadm -C /dev/md_d1 -e 1.2 -l 5 -n 4 -b internal -R /dev/sd?
>>> I see the last disk as a spare disk. After the restart of the array, all
>>> disks are seen active and the array is not continuing the aborted
>>> synchronization!
>>> Note that I did not use the --assume-clean option.
>>>
>>> 2° When I create the array without a bitmap:
>>> mdadm -C /dev/md_d1 -e 1.2 -l 5 -n 4 -R /dev/sd?
>>> I see the last disk as a spare disk. After the restart of the array, the
>>> spare disk is still a spare disk and the array continues the synchronization
>>> where it had stopped.
>>>
>>> In the case 1°, is this a bug or did I miss something?
>>> Secondly, what could be the consequences of this non-performed
>>> synchronization ?
>>>
>>> Kernel version: 2.6.26-rc4
>>> mdadm version: 2.6.2
>>>
>>> Thanks,
>>> Hubert
>> For the record, the new stable kernel 2.6.25.6 has the same issue.
>> I thought maybe the patch "md: fix prexor vs sync_request race" could have
>> fixed this, unfortunately not.
>>
>
> I am able to reproduce this here, and I notice that it does not happen
> with v0.90 superblocks.  In the v0.90 case when the array is stopped
> the last disk remains marked as spare.  The following hack seems to
> achieve the same effect for v1 arrays, but I wonder if it is
> correct... Neil?

Thanks Dan.
I quickly tried your patch on 2.6.25.6, unfortunately it did not fix the 
issue.

Regards,
Hubert
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: RAID5 losing initial synchronization on restart when one disk is spare
  2008-06-10 11:57 ` Hubert Verstraete
  2008-06-10 22:56   ` Dan Williams
@ 2008-06-11 14:44   ` Hubert Verstraete
  2008-06-11 23:38     ` Neil Brown
  1 sibling, 1 reply; 14+ messages in thread
From: Hubert Verstraete @ 2008-06-11 14:44 UTC (permalink / raw)
  To: linux-raid

Hubert Verstraete wrote:
> Hubert Verstraete wrote:
>> Hello
>>
>> According to mdadm's man page:
>> "When creating a RAID5 array, mdadm will automatically create a degraded
>> array with an extra spare drive. This is because building the spare
>> into a degraded array is in general faster than resyncing the parity on
>> a non-degraded, but not clean, array. This feature can be over-ridden
>> with the --force option."
>>
>> Unfortunately, I'm seeing a kind of bug when I create a RAID5 array 
>> with an internal bitmap, then stop the array before the initial 
>> synchronization is done and restart the array.
>>
>> 1° When I create the array with an internal bitmap:
>> mdadm -C /dev/md_d1 -e 1.2 -l 5 -n 4 -b internal -R /dev/sd?
>> I see the last disk as a spare disk. After the restart of the array, 
>> all disks are seen active and the array is not continuing the aborted 
>> synchronization!
>> Note that I did not use the --assume-clean option.
>>
>> 2° When I create the array without a bitmap:
>> mdadm -C /dev/md_d1 -e 1.2 -l 5 -n 4 -R /dev/sd?
>> I see the last disk as a spare disk. After the restart of the array, 
>> the spare disk is still a spare disk and the array continues the 
>> synchronization where it had stopped.
>>
>> In the case 1°, is this a bug or did I miss something?
>> Secondly, what could be the consequences of this non-performed 
>> synchronization ?
>>
>> Kernel version: 2.6.26-rc4
>> mdadm version: 2.6.2
>>
>> Thanks,
>> Hubert
>
> For the record, the new stable kernel 2.6.25.6 has the same issue.
> I thought maybe the patch "md: fix prexor vs sync_request race" could 
> have fixed this, unfortunately not.
>
> Regards,
> Hubert

By the way and FYI, with my configuration, all disks on the same 
controller, internal bitmap, v1 superblock, ... the initial RAID-5 
synchronization duration is the same whether I'm using the option 
--force or not.

Hubert
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: RAID5 losing initial synchronization on restart when one disk is spare
  2008-06-11 14:44   ` Hubert Verstraete
@ 2008-06-11 23:38     ` Neil Brown
  2008-06-12 13:05       ` Hubert Verstraete
  0 siblings, 1 reply; 14+ messages in thread
From: Neil Brown @ 2008-06-11 23:38 UTC (permalink / raw)
  To: Hubert Verstraete; +Cc: linux-raid

On Wednesday June 11, hubskml@free.fr wrote:
> 
> By the way and FYI, with my configuration, all disks on the same 
> controller, internal bitmap, v1 superblock, ... the initial RAID-5 
> synchronization duration is the same whether I'm using the option 
> --force or not.

For this to be a valid test, you need to fill one drive up with
garbage to ensure that a resync is no a no-op.

If you don't use the "--force" option, then the recovery process will
read from N-1 drives and write to 1 drive, all completely sequentially
so it will go at a predictable speed.

When you use "--force" it will read from N drive and check parity.
When it finds an error it will re-write that parity block.
So if the parity blocks happen to be all correct (as probably was the
case in your experiment), it will run nice and fast.  If the parity
blocks happen to all be wrong (as is likely when first creating an
array on drives that weren't an array before) it will be much slower.

NeilBrown

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: RAID5 losing initial synchronization on restart when one disk is spare
  2008-06-10 22:56   ` Dan Williams
  2008-06-11  9:27     ` Hubert Verstraete
@ 2008-06-11 23:40     ` Neil Brown
  1 sibling, 0 replies; 14+ messages in thread
From: Neil Brown @ 2008-06-11 23:40 UTC (permalink / raw)
  To: Dan Williams; +Cc: Hubert Verstraete, linux-raid

On Tuesday June 10, dan.j.williams@intel.com wrote:
> 
> I am able to reproduce this here, and I notice that it does not happen
> with v0.90 superblocks.  In the v0.90 case when the array is stopped
> the last disk remains marked as spare.  The following hack seems to
> achieve the same effect for v1 arrays, but I wonder if it is
> correct... Neil?

No, not correct.

The fact that a v1 array included spares in the array before recovery
completes is deliberate.  It allows recovery to be restarted from
where it got up to if the array is shut down while recovery is
happening.
If you don't mark the drive as a part of the array (though not
in_sync), then there is no opportunity for this optimisation.

Thanks,
NeilBrown

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: RAID5 losing initial synchronization on restart when one disk is spare
  2008-06-04 10:13 RAID5 losing initial synchronization on restart when one disk is spare Hubert Verstraete
  2008-06-10 11:57 ` Hubert Verstraete
@ 2008-06-11 23:45 ` Neil Brown
  2008-06-12  8:03   ` David Greaves
  2008-06-12  9:12   ` Hubert Verstraete
  1 sibling, 2 replies; 14+ messages in thread
From: Neil Brown @ 2008-06-11 23:45 UTC (permalink / raw)
  To: Hubert Verstraete; +Cc: linux-raid

On Wednesday June 4, hubskml@free.fr wrote:
> Hello
> 
> According to mdadm's man page:
> "When creating a RAID5 array, mdadm will automatically create a degraded
> array with an extra spare drive. This is because building the spare
> into a degraded array is in general faster than resyncing the parity on
> a non-degraded, but not clean, array. This feature can be over-ridden
> with the --force option."
> 
> Unfortunately, I'm seeing a kind of bug when I create a RAID5 array with 
> an internal bitmap, then stop the array before the initial 
> synchronization is done and restart the array.
> 
> 1° When I create the array with an internal bitmap:
> mdadm -C /dev/md_d1 -e 1.2 -l 5 -n 4 -b internal -R /dev/sd?
> I see the last disk as a spare disk. After the restart of the array, all 
> disks are seen active and the array is not continuing the aborted 
> synchronization!
> Note that I did not use the --assume-clean option.
> 
> 2° When I create the array without a bitmap:
> mdadm -C /dev/md_d1 -e 1.2 -l 5 -n 4 -R /dev/sd?
> I see the last disk as a spare disk. After the restart of the array, the 
> spare disk is still a spare disk and the array continues the 
> synchronization where it had stopped.
> 
> In the case 1°, is this a bug or did I miss something?

Thanks for the detailed report.  Yes, this is a bug.

The following patch fixes it, though I'm not 100% sure this is the
right fix (it may cause too much resync in some cases, which is better
than not enough, but not ideal).

> Secondly, what could be the consequences of this non-performed 
> synchronization ?

If you lose a drive, the data might get corrupted.

When writing to the array, the new parity block will sometimes be
calculated assuming that it was previously correct.  If all updates to
a particular parity block are of this sort, then it will still be
incorrect when you lose a drive, and data recovered based on that
parity block will be incorrect.

Until you lose a drive, it will have no visible effect.

NeilBrown


Signed-off-by: Neil Brown <neilb@suse.de>

diff .prev/drivers/md/raid5.c ./drivers/md/raid5.c
--- .prev/drivers/md/raid5.c	2008-06-10 10:27:51.000000000 +1000
+++ ./drivers/md/raid5.c	2008-06-12 09:34:25.000000000 +1000
@@ -4094,7 +4094,9 @@ static int run(mddev_t *mddev)
 				" disk %d\n", bdevname(rdev->bdev,b),
 				raid_disk);
 			working_disks++;
-		}
+		} else
+			/* Cannot rely on bitmap to complete recovery */
+			conf->fullsync = 1;
 	}
 
 	/*
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: RAID5 losing initial synchronization on restart when one disk is spare
  2008-06-11 23:45 ` Neil Brown
@ 2008-06-12  8:03   ` David Greaves
  2008-06-12 17:01     ` Dan Williams
  2008-06-12  9:12   ` Hubert Verstraete
  1 sibling, 1 reply; 14+ messages in thread
From: David Greaves @ 2008-06-12  8:03 UTC (permalink / raw)
  To: Neil Brown; +Cc: Hubert Verstraete, linux-raid

Neil Brown wrote:
> When writing to the array, the new parity block will sometimes be
> calculated assuming that it was previously correct.  If all updates to
> a particular parity block are of this sort, then it will still be
> incorrect when you lose a drive, and data recovered based on that
> parity block will be incorrect.
> 
> Until you lose a drive, it will have no visible effect.

There is a slight chance that this happened to me recently - would a
  echo check > /sys/block/mdX/md/sync_action
detect this?
and would
  echo repair > /sys/block/mdX/md/sync_action
fix it?

David

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: RAID5 losing initial synchronization on restart when one disk is spare
  2008-06-11 23:45 ` Neil Brown
  2008-06-12  8:03   ` David Greaves
@ 2008-06-12  9:12   ` Hubert Verstraete
  1 sibling, 0 replies; 14+ messages in thread
From: Hubert Verstraete @ 2008-06-12  9:12 UTC (permalink / raw)
  To: Neil Brown; +Cc: linux-raid

Neil Brown wrote:
> On Wednesday June 4, hubskml@free.fr wrote:
>> Hello
>>
>> According to mdadm's man page:
>> "When creating a RAID5 array, mdadm will automatically create a degraded
>> array with an extra spare drive. This is because building the spare
>> into a degraded array is in general faster than resyncing the parity on
>> a non-degraded, but not clean, array. This feature can be over-ridden
>> with the --force option."
>>
>> Unfortunately, I'm seeing a kind of bug when I create a RAID5 array with 
>> an internal bitmap, then stop the array before the initial 
>> synchronization is done and restart the array.
>>
>> 1° When I create the array with an internal bitmap:
>> mdadm -C /dev/md_d1 -e 1.2 -l 5 -n 4 -b internal -R /dev/sd?
>> I see the last disk as a spare disk. After the restart of the array, all 
>> disks are seen active and the array is not continuing the aborted 
>> synchronization!
>> Note that I did not use the --assume-clean option.
>>
>> 2° When I create the array without a bitmap:
>> mdadm -C /dev/md_d1 -e 1.2 -l 5 -n 4 -R /dev/sd?
>> I see the last disk as a spare disk. After the restart of the array, the 
>> spare disk is still a spare disk and the array continues the 
>> synchronization where it had stopped.
>>
>> In the case 1°, is this a bug or did I miss something?
> 
> Thanks for the detailed report.  Yes, this is a bug.
> 
> The following patch fixes it, though I'm not 100% sure this is the
> right fix (it may cause too much resync in some cases, which is better
> than not enough, but not ideal).
> 
> NeilBrown
> 
> Signed-off-by: Neil Brown <neilb@suse.de>
> 
> diff .prev/drivers/md/raid5.c ./drivers/md/raid5.c
> --- .prev/drivers/md/raid5.c	2008-06-10 10:27:51.000000000 +1000
> +++ ./drivers/md/raid5.c	2008-06-12 09:34:25.000000000 +1000
> @@ -4094,7 +4094,9 @@ static int run(mddev_t *mddev)
>  				" disk %d\n", bdevname(rdev->bdev,b),
>  				raid_disk);
>  			working_disks++;
> -		}
> +		} else
> +			/* Cannot rely on bitmap to complete recovery */
> +			conf->fullsync = 1;
>  	}
>  
>  	/*

Thanks Neil, I can confirm this solves this issue.
Regarding the eventual unwanted resync, I can't say.

Regards,
Hubert Verstraete
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: RAID5 losing initial synchronization on restart when one disk is spare
  2008-06-11 23:38     ` Neil Brown
@ 2008-06-12 13:05       ` Hubert Verstraete
  2008-06-12 16:59         ` Dan Williams
  0 siblings, 1 reply; 14+ messages in thread
From: Hubert Verstraete @ 2008-06-12 13:05 UTC (permalink / raw)
  To: Neil Brown; +Cc: linux-raid

Neil Brown wrote:
> On Wednesday June 11, hubskml@free.fr wrote:
>> By the way and FYI, with my configuration, all disks on the same 
>> controller, internal bitmap, v1 superblock, ... the initial RAID-5 
>> synchronization duration is the same whether I'm using the option 
>> --force or not.
>
> For this to be a valid test, you need to fill one drive up with
> garbage to ensure that a resync is no a no-op.
>
> If you don't use the "--force" option, then the recovery process will
> read from N-1 drives and write to 1 drive, all completely sequentially
> so it will go at a predictable speed.
>
> When you use "--force" it will read from N drive and check parity.
> When it finds an error it will re-write that parity block.
> So if the parity blocks happen to be all correct (as probably was the
> case in your experiment), it will run nice and fast.  If the parity
> blocks happen to all be wrong (as is likely when first creating an
> array on drives that weren't an array before) it will be much slower.

I've just filled all the drives with /dev/zero and am currently building 
a new array. Is this a valid test or should I fill the drives with 
/dev/random ?

Hubert

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: RAID5 losing initial synchronization on restart when one disk is spare
  2008-06-12 13:05       ` Hubert Verstraete
@ 2008-06-12 16:59         ` Dan Williams
  2008-06-12 18:11           ` Hubert Verstraete
  0 siblings, 1 reply; 14+ messages in thread
From: Dan Williams @ 2008-06-12 16:59 UTC (permalink / raw)
  To: Hubert Verstraete; +Cc: Neil Brown, linux-raid

On Thu, Jun 12, 2008 at 6:05 AM, Hubert Verstraete <hubskml@free.fr> wrote:
> Neil Brown wrote:
>>
>> On Wednesday June 11, hubskml@free.fr wrote:
>>>
>>> By the way and FYI, with my configuration, all disks on the same
>>> controller, internal bitmap, v1 superblock, ... the initial RAID-5
>>> synchronization duration is the same whether I'm using the option --force or
>>> not.
>>
>> For this to be a valid test, you need to fill one drive up with
>> garbage to ensure that a resync is no a no-op.
>>
>> If you don't use the "--force" option, then the recovery process will
>> read from N-1 drives and write to 1 drive, all completely sequentially
>> so it will go at a predictable speed.
>>
>> When you use "--force" it will read from N drive and check parity.
>> When it finds an error it will re-write that parity block.
>> So if the parity blocks happen to be all correct (as probably was the
>> case in your experiment), it will run nice and fast.  If the parity
>> blocks happen to all be wrong (as is likely when first creating an
>> array on drives that weren't an array before) it will be much slower.
>
> I've just filled all the drives with /dev/zero and am currently building a
> new array. Is this a valid test or should I fill the drives with /dev/random
> ?
>

No, /dev/zero will not work for this test since the xor sum of zeroed
blocks is zero.  The check operation has the following stages: read
the parity from disk, verify it is correct, and if it is not correct
calculate / writeback correct parity.  So, you need to ensure that the
parity check fails.  Using /dev/urandom should be sufficient.  Do not
use /dev/random as it will exhaust the entropy pool and then block.

--
Dan

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: RAID5 losing initial synchronization on restart when one disk is spare
  2008-06-12  8:03   ` David Greaves
@ 2008-06-12 17:01     ` Dan Williams
  0 siblings, 0 replies; 14+ messages in thread
From: Dan Williams @ 2008-06-12 17:01 UTC (permalink / raw)
  To: David Greaves; +Cc: Neil Brown, Hubert Verstraete, linux-raid

On Thu, Jun 12, 2008 at 1:03 AM, David Greaves <david@dgreaves.com> wrote:
> Neil Brown wrote:
>> When writing to the array, the new parity block will sometimes be
>> calculated assuming that it was previously correct.  If all updates to
>> a particular parity block are of this sort, then it will still be
>> incorrect when you lose a drive, and data recovered based on that
>> parity block will be incorrect.
>>
>> Until you lose a drive, it will have no visible effect.
>
> There is a slight chance that this happened to me recently - would a
>  echo check > /sys/block/mdX/md/sync_action
> detect this?
> and would
>  echo repair > /sys/block/mdX/md/sync_action
> fix it?
>

'Yes' on both questions.

--
Dan

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: RAID5 losing initial synchronization on restart when one disk is spare
  2008-06-12 16:59         ` Dan Williams
@ 2008-06-12 18:11           ` Hubert Verstraete
  0 siblings, 0 replies; 14+ messages in thread
From: Hubert Verstraete @ 2008-06-12 18:11 UTC (permalink / raw)
  To: Dan Williams; +Cc: Neil Brown, linux-raid

Dan Williams wrote:
> On Thu, Jun 12, 2008 at 6:05 AM, Hubert Verstraete <hubskml@free.fr> wrote:
>> Neil Brown wrote:
>>> On Wednesday June 11, hubskml@free.fr wrote:
>>>> By the way and FYI, with my configuration, all disks on the same
>>>> controller, internal bitmap, v1 superblock, ... the initial RAID-5
>>>> synchronization duration is the same whether I'm using the option --force or
>>>> not.
>>> For this to be a valid test, you need to fill one drive up with
>>> garbage to ensure that a resync is no a no-op.
>>>
>>> If you don't use the "--force" option, then the recovery process will
>>> read from N-1 drives and write to 1 drive, all completely sequentially
>>> so it will go at a predictable speed.
>>>
>>> When you use "--force" it will read from N drive and check parity.
>>> When it finds an error it will re-write that parity block.
>>> So if the parity blocks happen to be all correct (as probably was the
>>> case in your experiment), it will run nice and fast.  If the parity
>>> blocks happen to all be wrong (as is likely when first creating an
>>> array on drives that weren't an array before) it will be much slower.
>> I've just filled all the drives with /dev/zero and am currently building a
>> new array. Is this a valid test or should I fill the drives with /dev/random
>> ?
>
> No, /dev/zero will not work for this test since the xor sum of zeroed
> blocks is zero.  The check operation has the following stages: read
> the parity from disk, verify it is correct, and if it is not correct
> calculate / writeback correct parity.  So, you need to ensure that the
> parity check fails.  Using /dev/urandom should be sufficient.  Do not
> use /dev/random as it will exhaust the entropy pool and then block.

Confirmed, I got the same resync time after zeroing the hard drives.

Nevertheless, let's say I build the arrays on brand new disks.
If I can assume these disks are full of zero, I can freely choose my 
initial resync method :)
And it seems my new Seagate disk is made of zero (currently running a 
diff with /dev/zero).

Thanks for explanations.
Hubert

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2008-06-12 18:11 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-06-04 10:13 RAID5 losing initial synchronization on restart when one disk is spare Hubert Verstraete
2008-06-10 11:57 ` Hubert Verstraete
2008-06-10 22:56   ` Dan Williams
2008-06-11  9:27     ` Hubert Verstraete
2008-06-11 23:40     ` Neil Brown
2008-06-11 14:44   ` Hubert Verstraete
2008-06-11 23:38     ` Neil Brown
2008-06-12 13:05       ` Hubert Verstraete
2008-06-12 16:59         ` Dan Williams
2008-06-12 18:11           ` Hubert Verstraete
2008-06-11 23:45 ` Neil Brown
2008-06-12  8:03   ` David Greaves
2008-06-12 17:01     ` Dan Williams
2008-06-12  9:12   ` Hubert Verstraete

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).