* RAID5 losing initial synchronization on restart when one disk is spare
@ 2008-06-04 10:13 Hubert Verstraete
2008-06-10 11:57 ` Hubert Verstraete
2008-06-11 23:45 ` Neil Brown
0 siblings, 2 replies; 14+ messages in thread
From: Hubert Verstraete @ 2008-06-04 10:13 UTC (permalink / raw)
To: linux-raid
Hello
According to mdadm's man page:
"When creating a RAID5 array, mdadm will automatically create a degraded
array with an extra spare drive. This is because building the spare
into a degraded array is in general faster than resyncing the parity on
a non-degraded, but not clean, array. This feature can be over-ridden
with the --force option."
Unfortunately, I'm seeing a kind of bug when I create a RAID5 array with
an internal bitmap, then stop the array before the initial
synchronization is done and restart the array.
1° When I create the array with an internal bitmap:
mdadm -C /dev/md_d1 -e 1.2 -l 5 -n 4 -b internal -R /dev/sd?
I see the last disk as a spare disk. After the restart of the array, all
disks are seen active and the array is not continuing the aborted
synchronization!
Note that I did not use the --assume-clean option.
2° When I create the array without a bitmap:
mdadm -C /dev/md_d1 -e 1.2 -l 5 -n 4 -R /dev/sd?
I see the last disk as a spare disk. After the restart of the array, the
spare disk is still a spare disk and the array continues the
synchronization where it had stopped.
In the case 1°, is this a bug or did I miss something?
Secondly, what could be the consequences of this non-performed
synchronization ?
Kernel version: 2.6.26-rc4
mdadm version: 2.6.2
Thanks,
Hubert
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: RAID5 losing initial synchronization on restart when one disk is spare
2008-06-04 10:13 RAID5 losing initial synchronization on restart when one disk is spare Hubert Verstraete
@ 2008-06-10 11:57 ` Hubert Verstraete
2008-06-10 22:56 ` Dan Williams
2008-06-11 14:44 ` Hubert Verstraete
2008-06-11 23:45 ` Neil Brown
1 sibling, 2 replies; 14+ messages in thread
From: Hubert Verstraete @ 2008-06-10 11:57 UTC (permalink / raw)
To: linux-raid
Hubert Verstraete wrote:
> Hello
>
> According to mdadm's man page:
> "When creating a RAID5 array, mdadm will automatically create a degraded
> array with an extra spare drive. This is because building the spare
> into a degraded array is in general faster than resyncing the parity on
> a non-degraded, but not clean, array. This feature can be over-ridden
> with the --force option."
>
> Unfortunately, I'm seeing a kind of bug when I create a RAID5 array with
> an internal bitmap, then stop the array before the initial
> synchronization is done and restart the array.
>
> 1° When I create the array with an internal bitmap:
> mdadm -C /dev/md_d1 -e 1.2 -l 5 -n 4 -b internal -R /dev/sd?
> I see the last disk as a spare disk. After the restart of the array, all
> disks are seen active and the array is not continuing the aborted
> synchronization!
> Note that I did not use the --assume-clean option.
>
> 2° When I create the array without a bitmap:
> mdadm -C /dev/md_d1 -e 1.2 -l 5 -n 4 -R /dev/sd?
> I see the last disk as a spare disk. After the restart of the array, the
> spare disk is still a spare disk and the array continues the
> synchronization where it had stopped.
>
> In the case 1°, is this a bug or did I miss something?
> Secondly, what could be the consequences of this non-performed
> synchronization ?
>
> Kernel version: 2.6.26-rc4
> mdadm version: 2.6.2
>
> Thanks,
> Hubert
For the record, the new stable kernel 2.6.25.6 has the same issue.
I thought maybe the patch "md: fix prexor vs sync_request race" could
have fixed this, unfortunately not.
Regards,
Hubert
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: RAID5 losing initial synchronization on restart when one disk is spare
2008-06-10 11:57 ` Hubert Verstraete
@ 2008-06-10 22:56 ` Dan Williams
2008-06-11 9:27 ` Hubert Verstraete
2008-06-11 23:40 ` Neil Brown
2008-06-11 14:44 ` Hubert Verstraete
1 sibling, 2 replies; 14+ messages in thread
From: Dan Williams @ 2008-06-10 22:56 UTC (permalink / raw)
To: Hubert Verstraete; +Cc: linux-raid, Neil Brown
On Tue, Jun 10, 2008 at 4:57 AM, Hubert Verstraete <hubskml@free.fr> wrote:
> Hubert Verstraete wrote:
>>
>> Hello
>>
>> According to mdadm's man page:
>> "When creating a RAID5 array, mdadm will automatically create a degraded
>> array with an extra spare drive. This is because building the spare
>> into a degraded array is in general faster than resyncing the parity on
>> a non-degraded, but not clean, array. This feature can be over-ridden
>> with the --force option."
>>
>> Unfortunately, I'm seeing a kind of bug when I create a RAID5 array with
>> an internal bitmap, then stop the array before the initial synchronization
>> is done and restart the array.
>>
>> 1° When I create the array with an internal bitmap:
>> mdadm -C /dev/md_d1 -e 1.2 -l 5 -n 4 -b internal -R /dev/sd?
>> I see the last disk as a spare disk. After the restart of the array, all
>> disks are seen active and the array is not continuing the aborted
>> synchronization!
>> Note that I did not use the --assume-clean option.
>>
>> 2° When I create the array without a bitmap:
>> mdadm -C /dev/md_d1 -e 1.2 -l 5 -n 4 -R /dev/sd?
>> I see the last disk as a spare disk. After the restart of the array, the
>> spare disk is still a spare disk and the array continues the synchronization
>> where it had stopped.
>>
>> In the case 1°, is this a bug or did I miss something?
>> Secondly, what could be the consequences of this non-performed
>> synchronization ?
>>
>> Kernel version: 2.6.26-rc4
>> mdadm version: 2.6.2
>>
>> Thanks,
>> Hubert
>
> For the record, the new stable kernel 2.6.25.6 has the same issue.
> I thought maybe the patch "md: fix prexor vs sync_request race" could have
> fixed this, unfortunately not.
>
I am able to reproduce this here, and I notice that it does not happen
with v0.90 superblocks. In the v0.90 case when the array is stopped
the last disk remains marked as spare. The following hack seems to
achieve the same effect for v1 arrays, but I wonder if it is
correct... Neil?
diff --git a/drivers/md/md.c b/drivers/md/md.c
index e9380b5..c38425f 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -1234,6 +1234,7 @@ static int super_1_validate(mddev_t *mddev,
mdk_rdev_t *rdev)
role = le16_to_cpu(sb->dev_roles[rdev->desc_nr]);
switch(role) {
case 0xffff: /* spare */
+ set_bit(NeedRebuild, &rdev->flags);
break;
case 0xfffe: /* faulty */
set_bit(Faulty, &rdev->flags);
@@ -1321,7 +1322,8 @@ static void super_1_sync(mddev_t *mddev, mdk_rdev_t *rdev)
sb->dev_roles[i] = cpu_to_le16(0xfffe);
else if (test_bit(In_sync, &rdev2->flags))
sb->dev_roles[i] = cpu_to_le16(rdev2->raid_disk);
- else if (rdev2->raid_disk >= 0 && rdev2->recovery_offset > 0)
+ else if (rdev2->raid_disk >= 0 && rdev2->recovery_offset > 0 &&
+ !test_bit(NeedRebuild, &rdev2->flags))
sb->dev_roles[i] = cpu_to_le16(rdev2->raid_disk);
else
sb->dev_roles[i] = cpu_to_le16(0xffff);
diff --git a/include/linux/raid/md_k.h b/include/linux/raid/md_k.h
index 3dea9f5..79201d6 100644
--- a/include/linux/raid/md_k.h
+++ b/include/linux/raid/md_k.h
@@ -87,6 +87,10 @@ struct mdk_rdev_s
#define Blocked 8 /* An error occured on an externally
* managed array, don't allow writes
* until it is cleared */
+#define NeedRebuild 9 /* device needs to go through a rebuild
+ * cycle before its 'role' can be saved
+ * to disk
+ */
wait_queue_head_t blocked_wait;
int desc_nr; /* descriptor index in the superblock */
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: RAID5 losing initial synchronization on restart when one disk is spare
2008-06-10 22:56 ` Dan Williams
@ 2008-06-11 9:27 ` Hubert Verstraete
2008-06-11 23:40 ` Neil Brown
1 sibling, 0 replies; 14+ messages in thread
From: Hubert Verstraete @ 2008-06-11 9:27 UTC (permalink / raw)
To: Dan Williams; +Cc: linux-raid, Neil Brown
Dan Williams wrote:
> On Tue, Jun 10, 2008 at 4:57 AM, Hubert Verstraete <hubskml@free.fr> wrote:
>> Hubert Verstraete wrote:
>>> Hello
>>>
>>> According to mdadm's man page:
>>> "When creating a RAID5 array, mdadm will automatically create a degraded
>>> array with an extra spare drive. This is because building the spare
>>> into a degraded array is in general faster than resyncing the parity on
>>> a non-degraded, but not clean, array. This feature can be over-ridden
>>> with the --force option."
>>>
>>> Unfortunately, I'm seeing a kind of bug when I create a RAID5 array with
>>> an internal bitmap, then stop the array before the initial synchronization
>>> is done and restart the array.
>>>
>>> 1° When I create the array with an internal bitmap:
>>> mdadm -C /dev/md_d1 -e 1.2 -l 5 -n 4 -b internal -R /dev/sd?
>>> I see the last disk as a spare disk. After the restart of the array, all
>>> disks are seen active and the array is not continuing the aborted
>>> synchronization!
>>> Note that I did not use the --assume-clean option.
>>>
>>> 2° When I create the array without a bitmap:
>>> mdadm -C /dev/md_d1 -e 1.2 -l 5 -n 4 -R /dev/sd?
>>> I see the last disk as a spare disk. After the restart of the array, the
>>> spare disk is still a spare disk and the array continues the synchronization
>>> where it had stopped.
>>>
>>> In the case 1°, is this a bug or did I miss something?
>>> Secondly, what could be the consequences of this non-performed
>>> synchronization ?
>>>
>>> Kernel version: 2.6.26-rc4
>>> mdadm version: 2.6.2
>>>
>>> Thanks,
>>> Hubert
>> For the record, the new stable kernel 2.6.25.6 has the same issue.
>> I thought maybe the patch "md: fix prexor vs sync_request race" could have
>> fixed this, unfortunately not.
>>
>
> I am able to reproduce this here, and I notice that it does not happen
> with v0.90 superblocks. In the v0.90 case when the array is stopped
> the last disk remains marked as spare. The following hack seems to
> achieve the same effect for v1 arrays, but I wonder if it is
> correct... Neil?
Thanks Dan.
I quickly tried your patch on 2.6.25.6, unfortunately it did not fix the
issue.
Regards,
Hubert
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: RAID5 losing initial synchronization on restart when one disk is spare
2008-06-10 11:57 ` Hubert Verstraete
2008-06-10 22:56 ` Dan Williams
@ 2008-06-11 14:44 ` Hubert Verstraete
2008-06-11 23:38 ` Neil Brown
1 sibling, 1 reply; 14+ messages in thread
From: Hubert Verstraete @ 2008-06-11 14:44 UTC (permalink / raw)
To: linux-raid
Hubert Verstraete wrote:
> Hubert Verstraete wrote:
>> Hello
>>
>> According to mdadm's man page:
>> "When creating a RAID5 array, mdadm will automatically create a degraded
>> array with an extra spare drive. This is because building the spare
>> into a degraded array is in general faster than resyncing the parity on
>> a non-degraded, but not clean, array. This feature can be over-ridden
>> with the --force option."
>>
>> Unfortunately, I'm seeing a kind of bug when I create a RAID5 array
>> with an internal bitmap, then stop the array before the initial
>> synchronization is done and restart the array.
>>
>> 1° When I create the array with an internal bitmap:
>> mdadm -C /dev/md_d1 -e 1.2 -l 5 -n 4 -b internal -R /dev/sd?
>> I see the last disk as a spare disk. After the restart of the array,
>> all disks are seen active and the array is not continuing the aborted
>> synchronization!
>> Note that I did not use the --assume-clean option.
>>
>> 2° When I create the array without a bitmap:
>> mdadm -C /dev/md_d1 -e 1.2 -l 5 -n 4 -R /dev/sd?
>> I see the last disk as a spare disk. After the restart of the array,
>> the spare disk is still a spare disk and the array continues the
>> synchronization where it had stopped.
>>
>> In the case 1°, is this a bug or did I miss something?
>> Secondly, what could be the consequences of this non-performed
>> synchronization ?
>>
>> Kernel version: 2.6.26-rc4
>> mdadm version: 2.6.2
>>
>> Thanks,
>> Hubert
>
> For the record, the new stable kernel 2.6.25.6 has the same issue.
> I thought maybe the patch "md: fix prexor vs sync_request race" could
> have fixed this, unfortunately not.
>
> Regards,
> Hubert
By the way and FYI, with my configuration, all disks on the same
controller, internal bitmap, v1 superblock, ... the initial RAID-5
synchronization duration is the same whether I'm using the option
--force or not.
Hubert
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: RAID5 losing initial synchronization on restart when one disk is spare
2008-06-11 14:44 ` Hubert Verstraete
@ 2008-06-11 23:38 ` Neil Brown
2008-06-12 13:05 ` Hubert Verstraete
0 siblings, 1 reply; 14+ messages in thread
From: Neil Brown @ 2008-06-11 23:38 UTC (permalink / raw)
To: Hubert Verstraete; +Cc: linux-raid
On Wednesday June 11, hubskml@free.fr wrote:
>
> By the way and FYI, with my configuration, all disks on the same
> controller, internal bitmap, v1 superblock, ... the initial RAID-5
> synchronization duration is the same whether I'm using the option
> --force or not.
For this to be a valid test, you need to fill one drive up with
garbage to ensure that a resync is no a no-op.
If you don't use the "--force" option, then the recovery process will
read from N-1 drives and write to 1 drive, all completely sequentially
so it will go at a predictable speed.
When you use "--force" it will read from N drive and check parity.
When it finds an error it will re-write that parity block.
So if the parity blocks happen to be all correct (as probably was the
case in your experiment), it will run nice and fast. If the parity
blocks happen to all be wrong (as is likely when first creating an
array on drives that weren't an array before) it will be much slower.
NeilBrown
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: RAID5 losing initial synchronization on restart when one disk is spare
2008-06-10 22:56 ` Dan Williams
2008-06-11 9:27 ` Hubert Verstraete
@ 2008-06-11 23:40 ` Neil Brown
1 sibling, 0 replies; 14+ messages in thread
From: Neil Brown @ 2008-06-11 23:40 UTC (permalink / raw)
To: Dan Williams; +Cc: Hubert Verstraete, linux-raid
On Tuesday June 10, dan.j.williams@intel.com wrote:
>
> I am able to reproduce this here, and I notice that it does not happen
> with v0.90 superblocks. In the v0.90 case when the array is stopped
> the last disk remains marked as spare. The following hack seems to
> achieve the same effect for v1 arrays, but I wonder if it is
> correct... Neil?
No, not correct.
The fact that a v1 array included spares in the array before recovery
completes is deliberate. It allows recovery to be restarted from
where it got up to if the array is shut down while recovery is
happening.
If you don't mark the drive as a part of the array (though not
in_sync), then there is no opportunity for this optimisation.
Thanks,
NeilBrown
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: RAID5 losing initial synchronization on restart when one disk is spare
2008-06-04 10:13 RAID5 losing initial synchronization on restart when one disk is spare Hubert Verstraete
2008-06-10 11:57 ` Hubert Verstraete
@ 2008-06-11 23:45 ` Neil Brown
2008-06-12 8:03 ` David Greaves
2008-06-12 9:12 ` Hubert Verstraete
1 sibling, 2 replies; 14+ messages in thread
From: Neil Brown @ 2008-06-11 23:45 UTC (permalink / raw)
To: Hubert Verstraete; +Cc: linux-raid
On Wednesday June 4, hubskml@free.fr wrote:
> Hello
>
> According to mdadm's man page:
> "When creating a RAID5 array, mdadm will automatically create a degraded
> array with an extra spare drive. This is because building the spare
> into a degraded array is in general faster than resyncing the parity on
> a non-degraded, but not clean, array. This feature can be over-ridden
> with the --force option."
>
> Unfortunately, I'm seeing a kind of bug when I create a RAID5 array with
> an internal bitmap, then stop the array before the initial
> synchronization is done and restart the array.
>
> 1° When I create the array with an internal bitmap:
> mdadm -C /dev/md_d1 -e 1.2 -l 5 -n 4 -b internal -R /dev/sd?
> I see the last disk as a spare disk. After the restart of the array, all
> disks are seen active and the array is not continuing the aborted
> synchronization!
> Note that I did not use the --assume-clean option.
>
> 2° When I create the array without a bitmap:
> mdadm -C /dev/md_d1 -e 1.2 -l 5 -n 4 -R /dev/sd?
> I see the last disk as a spare disk. After the restart of the array, the
> spare disk is still a spare disk and the array continues the
> synchronization where it had stopped.
>
> In the case 1°, is this a bug or did I miss something?
Thanks for the detailed report. Yes, this is a bug.
The following patch fixes it, though I'm not 100% sure this is the
right fix (it may cause too much resync in some cases, which is better
than not enough, but not ideal).
> Secondly, what could be the consequences of this non-performed
> synchronization ?
If you lose a drive, the data might get corrupted.
When writing to the array, the new parity block will sometimes be
calculated assuming that it was previously correct. If all updates to
a particular parity block are of this sort, then it will still be
incorrect when you lose a drive, and data recovered based on that
parity block will be incorrect.
Until you lose a drive, it will have no visible effect.
NeilBrown
Signed-off-by: Neil Brown <neilb@suse.de>
diff .prev/drivers/md/raid5.c ./drivers/md/raid5.c
--- .prev/drivers/md/raid5.c 2008-06-10 10:27:51.000000000 +1000
+++ ./drivers/md/raid5.c 2008-06-12 09:34:25.000000000 +1000
@@ -4094,7 +4094,9 @@ static int run(mddev_t *mddev)
" disk %d\n", bdevname(rdev->bdev,b),
raid_disk);
working_disks++;
- }
+ } else
+ /* Cannot rely on bitmap to complete recovery */
+ conf->fullsync = 1;
}
/*
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: RAID5 losing initial synchronization on restart when one disk is spare
2008-06-11 23:45 ` Neil Brown
@ 2008-06-12 8:03 ` David Greaves
2008-06-12 17:01 ` Dan Williams
2008-06-12 9:12 ` Hubert Verstraete
1 sibling, 1 reply; 14+ messages in thread
From: David Greaves @ 2008-06-12 8:03 UTC (permalink / raw)
To: Neil Brown; +Cc: Hubert Verstraete, linux-raid
Neil Brown wrote:
> When writing to the array, the new parity block will sometimes be
> calculated assuming that it was previously correct. If all updates to
> a particular parity block are of this sort, then it will still be
> incorrect when you lose a drive, and data recovered based on that
> parity block will be incorrect.
>
> Until you lose a drive, it will have no visible effect.
There is a slight chance that this happened to me recently - would a
echo check > /sys/block/mdX/md/sync_action
detect this?
and would
echo repair > /sys/block/mdX/md/sync_action
fix it?
David
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: RAID5 losing initial synchronization on restart when one disk is spare
2008-06-11 23:45 ` Neil Brown
2008-06-12 8:03 ` David Greaves
@ 2008-06-12 9:12 ` Hubert Verstraete
1 sibling, 0 replies; 14+ messages in thread
From: Hubert Verstraete @ 2008-06-12 9:12 UTC (permalink / raw)
To: Neil Brown; +Cc: linux-raid
Neil Brown wrote:
> On Wednesday June 4, hubskml@free.fr wrote:
>> Hello
>>
>> According to mdadm's man page:
>> "When creating a RAID5 array, mdadm will automatically create a degraded
>> array with an extra spare drive. This is because building the spare
>> into a degraded array is in general faster than resyncing the parity on
>> a non-degraded, but not clean, array. This feature can be over-ridden
>> with the --force option."
>>
>> Unfortunately, I'm seeing a kind of bug when I create a RAID5 array with
>> an internal bitmap, then stop the array before the initial
>> synchronization is done and restart the array.
>>
>> 1° When I create the array with an internal bitmap:
>> mdadm -C /dev/md_d1 -e 1.2 -l 5 -n 4 -b internal -R /dev/sd?
>> I see the last disk as a spare disk. After the restart of the array, all
>> disks are seen active and the array is not continuing the aborted
>> synchronization!
>> Note that I did not use the --assume-clean option.
>>
>> 2° When I create the array without a bitmap:
>> mdadm -C /dev/md_d1 -e 1.2 -l 5 -n 4 -R /dev/sd?
>> I see the last disk as a spare disk. After the restart of the array, the
>> spare disk is still a spare disk and the array continues the
>> synchronization where it had stopped.
>>
>> In the case 1°, is this a bug or did I miss something?
>
> Thanks for the detailed report. Yes, this is a bug.
>
> The following patch fixes it, though I'm not 100% sure this is the
> right fix (it may cause too much resync in some cases, which is better
> than not enough, but not ideal).
>
> NeilBrown
>
> Signed-off-by: Neil Brown <neilb@suse.de>
>
> diff .prev/drivers/md/raid5.c ./drivers/md/raid5.c
> --- .prev/drivers/md/raid5.c 2008-06-10 10:27:51.000000000 +1000
> +++ ./drivers/md/raid5.c 2008-06-12 09:34:25.000000000 +1000
> @@ -4094,7 +4094,9 @@ static int run(mddev_t *mddev)
> " disk %d\n", bdevname(rdev->bdev,b),
> raid_disk);
> working_disks++;
> - }
> + } else
> + /* Cannot rely on bitmap to complete recovery */
> + conf->fullsync = 1;
> }
>
> /*
Thanks Neil, I can confirm this solves this issue.
Regarding the eventual unwanted resync, I can't say.
Regards,
Hubert Verstraete
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: RAID5 losing initial synchronization on restart when one disk is spare
2008-06-11 23:38 ` Neil Brown
@ 2008-06-12 13:05 ` Hubert Verstraete
2008-06-12 16:59 ` Dan Williams
0 siblings, 1 reply; 14+ messages in thread
From: Hubert Verstraete @ 2008-06-12 13:05 UTC (permalink / raw)
To: Neil Brown; +Cc: linux-raid
Neil Brown wrote:
> On Wednesday June 11, hubskml@free.fr wrote:
>> By the way and FYI, with my configuration, all disks on the same
>> controller, internal bitmap, v1 superblock, ... the initial RAID-5
>> synchronization duration is the same whether I'm using the option
>> --force or not.
>
> For this to be a valid test, you need to fill one drive up with
> garbage to ensure that a resync is no a no-op.
>
> If you don't use the "--force" option, then the recovery process will
> read from N-1 drives and write to 1 drive, all completely sequentially
> so it will go at a predictable speed.
>
> When you use "--force" it will read from N drive and check parity.
> When it finds an error it will re-write that parity block.
> So if the parity blocks happen to be all correct (as probably was the
> case in your experiment), it will run nice and fast. If the parity
> blocks happen to all be wrong (as is likely when first creating an
> array on drives that weren't an array before) it will be much slower.
I've just filled all the drives with /dev/zero and am currently building
a new array. Is this a valid test or should I fill the drives with
/dev/random ?
Hubert
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: RAID5 losing initial synchronization on restart when one disk is spare
2008-06-12 13:05 ` Hubert Verstraete
@ 2008-06-12 16:59 ` Dan Williams
2008-06-12 18:11 ` Hubert Verstraete
0 siblings, 1 reply; 14+ messages in thread
From: Dan Williams @ 2008-06-12 16:59 UTC (permalink / raw)
To: Hubert Verstraete; +Cc: Neil Brown, linux-raid
On Thu, Jun 12, 2008 at 6:05 AM, Hubert Verstraete <hubskml@free.fr> wrote:
> Neil Brown wrote:
>>
>> On Wednesday June 11, hubskml@free.fr wrote:
>>>
>>> By the way and FYI, with my configuration, all disks on the same
>>> controller, internal bitmap, v1 superblock, ... the initial RAID-5
>>> synchronization duration is the same whether I'm using the option --force or
>>> not.
>>
>> For this to be a valid test, you need to fill one drive up with
>> garbage to ensure that a resync is no a no-op.
>>
>> If you don't use the "--force" option, then the recovery process will
>> read from N-1 drives and write to 1 drive, all completely sequentially
>> so it will go at a predictable speed.
>>
>> When you use "--force" it will read from N drive and check parity.
>> When it finds an error it will re-write that parity block.
>> So if the parity blocks happen to be all correct (as probably was the
>> case in your experiment), it will run nice and fast. If the parity
>> blocks happen to all be wrong (as is likely when first creating an
>> array on drives that weren't an array before) it will be much slower.
>
> I've just filled all the drives with /dev/zero and am currently building a
> new array. Is this a valid test or should I fill the drives with /dev/random
> ?
>
No, /dev/zero will not work for this test since the xor sum of zeroed
blocks is zero. The check operation has the following stages: read
the parity from disk, verify it is correct, and if it is not correct
calculate / writeback correct parity. So, you need to ensure that the
parity check fails. Using /dev/urandom should be sufficient. Do not
use /dev/random as it will exhaust the entropy pool and then block.
--
Dan
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: RAID5 losing initial synchronization on restart when one disk is spare
2008-06-12 8:03 ` David Greaves
@ 2008-06-12 17:01 ` Dan Williams
0 siblings, 0 replies; 14+ messages in thread
From: Dan Williams @ 2008-06-12 17:01 UTC (permalink / raw)
To: David Greaves; +Cc: Neil Brown, Hubert Verstraete, linux-raid
On Thu, Jun 12, 2008 at 1:03 AM, David Greaves <david@dgreaves.com> wrote:
> Neil Brown wrote:
>> When writing to the array, the new parity block will sometimes be
>> calculated assuming that it was previously correct. If all updates to
>> a particular parity block are of this sort, then it will still be
>> incorrect when you lose a drive, and data recovered based on that
>> parity block will be incorrect.
>>
>> Until you lose a drive, it will have no visible effect.
>
> There is a slight chance that this happened to me recently - would a
> echo check > /sys/block/mdX/md/sync_action
> detect this?
> and would
> echo repair > /sys/block/mdX/md/sync_action
> fix it?
>
'Yes' on both questions.
--
Dan
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: RAID5 losing initial synchronization on restart when one disk is spare
2008-06-12 16:59 ` Dan Williams
@ 2008-06-12 18:11 ` Hubert Verstraete
0 siblings, 0 replies; 14+ messages in thread
From: Hubert Verstraete @ 2008-06-12 18:11 UTC (permalink / raw)
To: Dan Williams; +Cc: Neil Brown, linux-raid
Dan Williams wrote:
> On Thu, Jun 12, 2008 at 6:05 AM, Hubert Verstraete <hubskml@free.fr> wrote:
>> Neil Brown wrote:
>>> On Wednesday June 11, hubskml@free.fr wrote:
>>>> By the way and FYI, with my configuration, all disks on the same
>>>> controller, internal bitmap, v1 superblock, ... the initial RAID-5
>>>> synchronization duration is the same whether I'm using the option --force or
>>>> not.
>>> For this to be a valid test, you need to fill one drive up with
>>> garbage to ensure that a resync is no a no-op.
>>>
>>> If you don't use the "--force" option, then the recovery process will
>>> read from N-1 drives and write to 1 drive, all completely sequentially
>>> so it will go at a predictable speed.
>>>
>>> When you use "--force" it will read from N drive and check parity.
>>> When it finds an error it will re-write that parity block.
>>> So if the parity blocks happen to be all correct (as probably was the
>>> case in your experiment), it will run nice and fast. If the parity
>>> blocks happen to all be wrong (as is likely when first creating an
>>> array on drives that weren't an array before) it will be much slower.
>> I've just filled all the drives with /dev/zero and am currently building a
>> new array. Is this a valid test or should I fill the drives with /dev/random
>> ?
>
> No, /dev/zero will not work for this test since the xor sum of zeroed
> blocks is zero. The check operation has the following stages: read
> the parity from disk, verify it is correct, and if it is not correct
> calculate / writeback correct parity. So, you need to ensure that the
> parity check fails. Using /dev/urandom should be sufficient. Do not
> use /dev/random as it will exhaust the entropy pool and then block.
Confirmed, I got the same resync time after zeroing the hard drives.
Nevertheless, let's say I build the arrays on brand new disks.
If I can assume these disks are full of zero, I can freely choose my
initial resync method :)
And it seems my new Seagate disk is made of zero (currently running a
diff with /dev/zero).
Thanks for explanations.
Hubert
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2008-06-12 18:11 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-06-04 10:13 RAID5 losing initial synchronization on restart when one disk is spare Hubert Verstraete
2008-06-10 11:57 ` Hubert Verstraete
2008-06-10 22:56 ` Dan Williams
2008-06-11 9:27 ` Hubert Verstraete
2008-06-11 23:40 ` Neil Brown
2008-06-11 14:44 ` Hubert Verstraete
2008-06-11 23:38 ` Neil Brown
2008-06-12 13:05 ` Hubert Verstraete
2008-06-12 16:59 ` Dan Williams
2008-06-12 18:11 ` Hubert Verstraete
2008-06-11 23:45 ` Neil Brown
2008-06-12 8:03 ` David Greaves
2008-06-12 17:01 ` Dan Williams
2008-06-12 9:12 ` Hubert Verstraete
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).