* Doing 'echo repair > /sys/devices/virtual/block/md?/md/sync_action' does not result in mismatch_cnt of 0 on RAID-6?
@ 2011-03-15 11:30 Bas van Schaik
2011-03-15 12:13 ` Robin Hill
0 siblings, 1 reply; 6+ messages in thread
From: Bas van Schaik @ 2011-03-15 11:30 UTC (permalink / raw)
To: linux-raid
All,
I'm seeing a non-zero mismatch_cnt (in fact, it's 1704) on my RAID-6
array consisting of 8 devices on kernel 2.6.38. After replacing some
hardware, I decided to trigger a MD repair by issuing:
echo repair > /sys/devices/virtual/block/md5/md/sync_action
Directly after issuing this command, the mismatch_cnt is reset to 0 and
MD starts checking the array. However, the mismatch_cnt increases during
this check - resulting in exactly the same count as seen before.
Shouldn't 'repair' yield a zero mismatch_cnt? I think I have seen
'repair' work on other RAID-6 arrays?
Furthermore, theoretically it should be possible to indicate which
device in the RAID-6 array contains the inconsistent data, or am I
mistaking? If so, that would certainly be a nice feature to see
implemented, as it would help diagnosing problems.
Please let me know your thoughts, as I'm quite keen to get my
mismatch_cnt back to 0 in order to see whether the new hardware works
properly!
Thanks,
Bas
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Doing 'echo repair > /sys/devices/virtual/block/md?/md/sync_action' does not result in mismatch_cnt of 0 on RAID-6?
2011-03-15 11:30 Doing 'echo repair > /sys/devices/virtual/block/md?/md/sync_action' does not result in mismatch_cnt of 0 on RAID-6? Bas van Schaik
@ 2011-03-15 12:13 ` Robin Hill
2011-03-15 13:43 ` Bas van Schaik
0 siblings, 1 reply; 6+ messages in thread
From: Robin Hill @ 2011-03-15 12:13 UTC (permalink / raw)
To: linux-raid
[-- Attachment #1: Type: text/plain, Size: 1095 bytes --]
On Tue Mar 15, 2011 at 11:30:59AM +0000, Bas van Schaik wrote:
> All,
>
> I'm seeing a non-zero mismatch_cnt (in fact, it's 1704) on my RAID-6
> array consisting of 8 devices on kernel 2.6.38. After replacing some
> hardware, I decided to trigger a MD repair by issuing:
> echo repair > /sys/devices/virtual/block/md5/md/sync_action
>
> Directly after issuing this command, the mismatch_cnt is reset to 0 and
> MD starts checking the array. However, the mismatch_cnt increases during
> this check - resulting in exactly the same count as seen before.
> Shouldn't 'repair' yield a zero mismatch_cnt? I think I have seen
> 'repair' work on other RAID-6 arrays?
>
The mismatch_cnt is incremented during repair to indicate how many
errors were repaired. If you want to be certain though, you'd need to
re-run 'check' afterwards.
Cheers,
Robin
--
___
( ' } | Robin Hill <robin@robinhill.me.uk> |
/ / ) | Little Jim says .... |
// !! | "He fallen in de water !!" |
[-- Attachment #2: Type: application/pgp-signature, Size: 198 bytes --]
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Doing 'echo repair > /sys/devices/virtual/block/md?/md/sync_action' does not result in mismatch_cnt of 0 on RAID-6?
2011-03-15 12:13 ` Robin Hill
@ 2011-03-15 13:43 ` Bas van Schaik
2011-03-15 14:13 ` Robin Hill
0 siblings, 1 reply; 6+ messages in thread
From: Bas van Schaik @ 2011-03-15 13:43 UTC (permalink / raw)
To: linux-raid
On 15/03/11 12:13, Robin Hill wrote:
> On Tue Mar 15, 2011 at 11:30:59AM +0000, Bas van Schaik wrote:
>> All,
>>
>> I'm seeing a non-zero mismatch_cnt (in fact, it's 1704) on my RAID-6
>> array consisting of 8 devices on kernel 2.6.38. After replacing some
>> hardware, I decided to trigger a MD repair by issuing:
>> echo repair > /sys/devices/virtual/block/md5/md/sync_action
>>
>> Directly after issuing this command, the mismatch_cnt is reset to 0 and
>> MD starts checking the array. However, the mismatch_cnt increases during
>> this check - resulting in exactly the same count as seen before.
>> Shouldn't 'repair' yield a zero mismatch_cnt? I think I have seen
>> 'repair' work on other RAID-6 arrays?
>>
> The mismatch_cnt is incremented during repair to indicate how many
> errors were repaired. If you want to be certain though, you'd need to
> re-run 'check' afterwards.
Sorry about that - I was sure the mismatch_cnt was reset after a repair
on a different machine, but apparently I was wrong. The 'check' is
running right now, I hope you are right! If not, of course I'll let you
know.
My other question is still standing:
> Furthermore, theoretically it should be possible to indicate which
> device in the RAID-6 array contains the inconsistent data, or am I
> mistaking? If so, that would certainly be a nice feature to see
> implemented, as it would help diagnosing problems.
Am I indeed correct in thinking this?
Thanks,
Bas
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Doing 'echo repair > /sys/devices/virtual/block/md?/md/sync_action' does not result in mismatch_cnt of 0 on RAID-6?
2011-03-15 13:43 ` Bas van Schaik
@ 2011-03-15 14:13 ` Robin Hill
2011-04-01 22:44 ` Bas van Schaik
0 siblings, 1 reply; 6+ messages in thread
From: Robin Hill @ 2011-03-15 14:13 UTC (permalink / raw)
To: Bas van Schaik; +Cc: linux-raid
[-- Attachment #1: Type: text/plain, Size: 1214 bytes --]
On Tue Mar 15, 2011 at 01:43:01PM +0000, Bas van Schaik wrote:
> My other question is still standing:
> > Furthermore, theoretically it should be possible to indicate which
> > device in the RAID-6 array contains the inconsistent data, or am I
> > mistaking? If so, that would certainly be a nice feature to see
> > implemented, as it would help diagnosing problems.
> Am I indeed correct in thinking this?
>
I'm not sure. If it's a single data block that's failed then you should
be able to, for each disk, re-generate the data using the other disks
and the P parity, then validate against the Q parity (if it matches then
that disk is the incorrect one). You should also be able to detect
errors in either the P or Q parity (if one is valid for the data and the
other isn't). If there's multiple disks which are incorrect then I
don't think there's any way you can tell which (or even avoid having one
of the correct disks flagged as incorrect).
Cheers,
Robin
--
___
( ' } | Robin Hill <robin@robinhill.me.uk> |
/ / ) | Little Jim says .... |
// !! | "He fallen in de water !!" |
[-- Attachment #2: Type: application/pgp-signature, Size: 198 bytes --]
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Doing 'echo repair > /sys/devices/virtual/block/md?/md/sync_action' does not result in mismatch_cnt of 0 on RAID-6?
2011-03-15 14:13 ` Robin Hill
@ 2011-04-01 22:44 ` Bas van Schaik
2011-04-01 23:48 ` Rory Jaffe
0 siblings, 1 reply; 6+ messages in thread
From: Bas van Schaik @ 2011-04-01 22:44 UTC (permalink / raw)
To: linux-raid
On 03/15/2011 02:13 PM, Robin Hill wrote:
> On Tue Mar 15, 2011 at 01:43:01PM +0000, Bas van Schaik wrote
>> My other question is still standing:
>>> Furthermore, theoretically it should be possible to indicate which
>>> device in the RAID-6 array contains the inconsistent data, or am I
>>> mistaking? If so, that would certainly be a nice feature to see
>>> implemented, as it would help diagnosing problems.
>> Am I indeed correct in thinking this?
> I'm not sure. If it's a single data block that's failed then you should
> be able to, for each disk, re-generate the data using the other disks
> and the P parity, then validate against the Q parity (if it matches then
> that disk is the incorrect one). You should also be able to detect
> errors in either the P or Q parity (if one is valid for the data and the
> other isn't). If there's multiple disks which are incorrect then I
> don't think there's any way you can tell which (or even avoid having one
> of the correct disks flagged as incorrect).
Indeed, that is what I was thinking. As I've just discovered some new
block mismatches (that's 2 weeks after the last repair!) on my 8x2TB
RAID6 array, it would be really nice to see this feature implemented...
I would be happy to contribute, but I am not very experienced in hacking
kernel C.
Any tips, tricks and/or suggestions anyone?
Cheers,
Bas
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Doing 'echo repair > /sys/devices/virtual/block/md?/md/sync_action' does not result in mismatch_cnt of 0 on RAID-6?
2011-04-01 22:44 ` Bas van Schaik
@ 2011-04-01 23:48 ` Rory Jaffe
0 siblings, 0 replies; 6+ messages in thread
From: Rory Jaffe @ 2011-04-01 23:48 UTC (permalink / raw)
To: Bas van Schaik; +Cc: linux-raid
I had the same question and ended up looking at the source. The kernel
documentation was maddeningly vague about this.
/drivers/md/raid5.c (which handles both 5 and 6), has, in procedure
handle_parity_checks5 and handle_parity_checks6 similar comments:
/* handle a successful check operation, if parity is correct
* we are done. Otherwise update the mismatch count and repair
* parity if !MD_RECOVERY_CHECK
*/
and the program logic does just that--update the count, then check for
the flag, and repair if the flag isn't set.
And in /drivers/md/md.c the section that parses the command has the following:
if (cmd_match(page, "check"))
set_bit(MD_RECOVERY_CHECK, &mddev->recovery);
else if (!cmd_match(page, "repair"))
return -EINVAL;
set_bit(MD_RECOVERY_REQUESTED, &mddev->recovery);
set_bit(MD_RECOVERY_SYNC, &mddev->recovery);
So it looks like the only difference between check and repair is the
MD_RECOVERY_CHECK flag, which is set for check only.
On Fri, Apr 1, 2011 at 3:44 PM, Bas van Schaik <bas@tuxes.nl> wrote:
> On 03/15/2011 02:13 PM, Robin Hill wrote:
>> On Tue Mar 15, 2011 at 01:43:01PM +0000, Bas van Schaik wrote
>>> My other question is still standing:
>>>> Furthermore, theoretically it should be possible to indicate which
>>>> device in the RAID-6 array contains the inconsistent data, or am I
>>>> mistaking? If so, that would certainly be a nice feature to see
>>>> implemented, as it would help diagnosing problems.
>>> Am I indeed correct in thinking this?
>> I'm not sure. If it's a single data block that's failed then you should
>> be able to, for each disk, re-generate the data using the other disks
>> and the P parity, then validate against the Q parity (if it matches then
>> that disk is the incorrect one). You should also be able to detect
>> errors in either the P or Q parity (if one is valid for the data and the
>> other isn't). If there's multiple disks which are incorrect then I
>> don't think there's any way you can tell which (or even avoid having one
>> of the correct disks flagged as incorrect).
> Indeed, that is what I was thinking. As I've just discovered some new
> block mismatches (that's 2 weeks after the last repair!) on my 8x2TB
> RAID6 array, it would be really nice to see this feature implemented...
> I would be happy to contribute, but I am not very experienced in hacking
> kernel C.
>
> Any tips, tricks and/or suggestions anyone?
>
> Cheers,
>
> Bas
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2011-04-01 23:48 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-03-15 11:30 Doing 'echo repair > /sys/devices/virtual/block/md?/md/sync_action' does not result in mismatch_cnt of 0 on RAID-6? Bas van Schaik
2011-03-15 12:13 ` Robin Hill
2011-03-15 13:43 ` Bas van Schaik
2011-03-15 14:13 ` Robin Hill
2011-04-01 22:44 ` Bas van Schaik
2011-04-01 23:48 ` Rory Jaffe
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).