linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Doing 'echo repair > /sys/devices/virtual/block/md?/md/sync_action' does not result in mismatch_cnt of 0 on RAID-6?
@ 2011-03-15 11:30 Bas van Schaik
  2011-03-15 12:13 ` Robin Hill
  0 siblings, 1 reply; 6+ messages in thread
From: Bas van Schaik @ 2011-03-15 11:30 UTC (permalink / raw)
  To: linux-raid

All,

I'm seeing a non-zero mismatch_cnt (in fact, it's 1704) on my RAID-6
array consisting of 8 devices on kernel 2.6.38. After replacing some
hardware, I decided to trigger a MD repair by issuing:
  echo repair > /sys/devices/virtual/block/md5/md/sync_action

Directly after issuing this command, the mismatch_cnt is reset to 0 and
MD starts checking the array. However, the mismatch_cnt increases during
this check - resulting in exactly the same count as seen before.
Shouldn't 'repair' yield a zero mismatch_cnt? I think I have seen
'repair' work on other RAID-6 arrays?

Furthermore, theoretically it should be possible to indicate which
device in the RAID-6 array contains the inconsistent data, or am I
mistaking? If so, that would certainly be a nice feature to see
implemented, as it would help diagnosing problems.

Please let me know your thoughts, as I'm quite keen to get my
mismatch_cnt back to 0 in order to see whether the new hardware works
properly!

Thanks,

  Bas

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Doing 'echo repair > /sys/devices/virtual/block/md?/md/sync_action' does not result in mismatch_cnt of 0 on RAID-6?
  2011-03-15 11:30 Doing 'echo repair > /sys/devices/virtual/block/md?/md/sync_action' does not result in mismatch_cnt of 0 on RAID-6? Bas van Schaik
@ 2011-03-15 12:13 ` Robin Hill
  2011-03-15 13:43   ` Bas van Schaik
  0 siblings, 1 reply; 6+ messages in thread
From: Robin Hill @ 2011-03-15 12:13 UTC (permalink / raw)
  To: linux-raid

[-- Attachment #1: Type: text/plain, Size: 1095 bytes --]

On Tue Mar 15, 2011 at 11:30:59AM +0000, Bas van Schaik wrote:

> All,
> 
> I'm seeing a non-zero mismatch_cnt (in fact, it's 1704) on my RAID-6
> array consisting of 8 devices on kernel 2.6.38. After replacing some
> hardware, I decided to trigger a MD repair by issuing:
>   echo repair > /sys/devices/virtual/block/md5/md/sync_action
> 
> Directly after issuing this command, the mismatch_cnt is reset to 0 and
> MD starts checking the array. However, the mismatch_cnt increases during
> this check - resulting in exactly the same count as seen before.
> Shouldn't 'repair' yield a zero mismatch_cnt? I think I have seen
> 'repair' work on other RAID-6 arrays?
> 
The mismatch_cnt is incremented during repair to indicate how many
errors were repaired. If you want to be certain though, you'd need to
re-run 'check' afterwards.

Cheers,
    Robin
-- 
     ___        
    ( ' }     |       Robin Hill        <robin@robinhill.me.uk> |
   / / )      | Little Jim says ....                            |
  // !!       |      "He fallen in de water !!"                 |

[-- Attachment #2: Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Doing 'echo repair > /sys/devices/virtual/block/md?/md/sync_action' does not result in mismatch_cnt of 0 on RAID-6?
  2011-03-15 12:13 ` Robin Hill
@ 2011-03-15 13:43   ` Bas van Schaik
  2011-03-15 14:13     ` Robin Hill
  0 siblings, 1 reply; 6+ messages in thread
From: Bas van Schaik @ 2011-03-15 13:43 UTC (permalink / raw)
  To: linux-raid

On 15/03/11 12:13, Robin Hill wrote:
> On Tue Mar 15, 2011 at 11:30:59AM +0000, Bas van Schaik wrote:
>> All,
>>
>> I'm seeing a non-zero mismatch_cnt (in fact, it's 1704) on my RAID-6
>> array consisting of 8 devices on kernel 2.6.38. After replacing some
>> hardware, I decided to trigger a MD repair by issuing:
>>   echo repair > /sys/devices/virtual/block/md5/md/sync_action
>>
>> Directly after issuing this command, the mismatch_cnt is reset to 0 and
>> MD starts checking the array. However, the mismatch_cnt increases during
>> this check - resulting in exactly the same count as seen before.
>> Shouldn't 'repair' yield a zero mismatch_cnt? I think I have seen
>> 'repair' work on other RAID-6 arrays?
>>
> The mismatch_cnt is incremented during repair to indicate how many
> errors were repaired. If you want to be certain though, you'd need to
> re-run 'check' afterwards.
Sorry about that - I was sure the mismatch_cnt was reset after a repair
on a different machine, but apparently I was wrong. The 'check' is
running right now, I hope you are right! If not, of course I'll let you
know.

My other question is still standing:
> Furthermore, theoretically it should be possible to indicate which
> device in the RAID-6 array contains the inconsistent data, or am I
> mistaking? If so, that would certainly be a nice feature to see
> implemented, as it would help diagnosing problems.
Am I indeed correct in thinking this?

Thanks,

  Bas

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Doing 'echo repair > /sys/devices/virtual/block/md?/md/sync_action' does not result in mismatch_cnt of 0 on RAID-6?
  2011-03-15 13:43   ` Bas van Schaik
@ 2011-03-15 14:13     ` Robin Hill
  2011-04-01 22:44       ` Bas van Schaik
  0 siblings, 1 reply; 6+ messages in thread
From: Robin Hill @ 2011-03-15 14:13 UTC (permalink / raw)
  To: Bas van Schaik; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 1214 bytes --]

On Tue Mar 15, 2011 at 01:43:01PM +0000, Bas van Schaik wrote:

> My other question is still standing:
> > Furthermore, theoretically it should be possible to indicate which
> > device in the RAID-6 array contains the inconsistent data, or am I
> > mistaking? If so, that would certainly be a nice feature to see
> > implemented, as it would help diagnosing problems.
> Am I indeed correct in thinking this?
> 
I'm not sure. If it's a single data block that's failed then you should
be able to, for each disk, re-generate the data using the other disks
and the P parity, then validate against the Q parity (if it matches then
that disk is the incorrect one). You should also be able to detect
errors in either the P or Q parity (if one is valid for the data and the
other isn't).  If there's multiple disks which are incorrect then I
don't think there's any way you can tell which (or even avoid having one
of the correct disks flagged as incorrect).

Cheers,
    Robin
-- 
     ___        
    ( ' }     |       Robin Hill        <robin@robinhill.me.uk> |
   / / )      | Little Jim says ....                            |
  // !!       |      "He fallen in de water !!"                 |

[-- Attachment #2: Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Doing 'echo repair > /sys/devices/virtual/block/md?/md/sync_action' does not result in mismatch_cnt of 0 on RAID-6?
  2011-03-15 14:13     ` Robin Hill
@ 2011-04-01 22:44       ` Bas van Schaik
  2011-04-01 23:48         ` Rory Jaffe
  0 siblings, 1 reply; 6+ messages in thread
From: Bas van Schaik @ 2011-04-01 22:44 UTC (permalink / raw)
  To: linux-raid

On 03/15/2011 02:13 PM, Robin Hill wrote:
> On Tue Mar 15, 2011 at 01:43:01PM +0000, Bas van Schaik wrote
>> My other question is still standing:
>>> Furthermore, theoretically it should be possible to indicate which
>>> device in the RAID-6 array contains the inconsistent data, or am I
>>> mistaking? If so, that would certainly be a nice feature to see
>>> implemented, as it would help diagnosing problems.
>> Am I indeed correct in thinking this?
> I'm not sure. If it's a single data block that's failed then you should
> be able to, for each disk, re-generate the data using the other disks
> and the P parity, then validate against the Q parity (if it matches then
> that disk is the incorrect one). You should also be able to detect
> errors in either the P or Q parity (if one is valid for the data and the
> other isn't).  If there's multiple disks which are incorrect then I
> don't think there's any way you can tell which (or even avoid having one
> of the correct disks flagged as incorrect).
Indeed, that is what I was thinking. As I've just discovered some new
block mismatches (that's 2 weeks after the last repair!) on my 8x2TB
RAID6 array, it would be really nice to see this feature implemented...
I would be happy to contribute, but I am not very experienced in hacking
kernel C.

Any tips, tricks and/or suggestions anyone?

Cheers,

  Bas

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Doing 'echo repair > /sys/devices/virtual/block/md?/md/sync_action' does not result in mismatch_cnt of 0 on RAID-6?
  2011-04-01 22:44       ` Bas van Schaik
@ 2011-04-01 23:48         ` Rory Jaffe
  0 siblings, 0 replies; 6+ messages in thread
From: Rory Jaffe @ 2011-04-01 23:48 UTC (permalink / raw)
  To: Bas van Schaik; +Cc: linux-raid

I had the same question and ended up looking at the source. The kernel
documentation was maddeningly vague about this.

/drivers/md/raid5.c (which handles both 5 and 6), has, in procedure
handle_parity_checks5 and handle_parity_checks6 similar comments:

/* handle a successful check operation, if parity is correct
		 * we are done.  Otherwise update the mismatch count and repair
		 * parity if !MD_RECOVERY_CHECK
		 */
and the program logic does just that--update the count, then check for
the flag, and repair if the flag isn't set.

And in /drivers/md/md.c the section that parses the command has the following:

if (cmd_match(page, "check"))
			set_bit(MD_RECOVERY_CHECK, &mddev->recovery);
		else if (!cmd_match(page, "repair"))
			return -EINVAL;
		set_bit(MD_RECOVERY_REQUESTED, &mddev->recovery);
		set_bit(MD_RECOVERY_SYNC, &mddev->recovery);
So it looks like the only difference between check and repair is the
MD_RECOVERY_CHECK flag, which is set for check only.


On Fri, Apr 1, 2011 at 3:44 PM, Bas van Schaik <bas@tuxes.nl> wrote:
> On 03/15/2011 02:13 PM, Robin Hill wrote:
>> On Tue Mar 15, 2011 at 01:43:01PM +0000, Bas van Schaik wrote
>>> My other question is still standing:
>>>> Furthermore, theoretically it should be possible to indicate which
>>>> device in the RAID-6 array contains the inconsistent data, or am I
>>>> mistaking? If so, that would certainly be a nice feature to see
>>>> implemented, as it would help diagnosing problems.
>>> Am I indeed correct in thinking this?
>> I'm not sure. If it's a single data block that's failed then you should
>> be able to, for each disk, re-generate the data using the other disks
>> and the P parity, then validate against the Q parity (if it matches then
>> that disk is the incorrect one). You should also be able to detect
>> errors in either the P or Q parity (if one is valid for the data and the
>> other isn't).  If there's multiple disks which are incorrect then I
>> don't think there's any way you can tell which (or even avoid having one
>> of the correct disks flagged as incorrect).
> Indeed, that is what I was thinking. As I've just discovered some new
> block mismatches (that's 2 weeks after the last repair!) on my 8x2TB
> RAID6 array, it would be really nice to see this feature implemented...
> I would be happy to contribute, but I am not very experienced in hacking
> kernel C.
>
> Any tips, tricks and/or suggestions anyone?
>
> Cheers,
>
>  Bas
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2011-04-01 23:48 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-03-15 11:30 Doing 'echo repair > /sys/devices/virtual/block/md?/md/sync_action' does not result in mismatch_cnt of 0 on RAID-6? Bas van Schaik
2011-03-15 12:13 ` Robin Hill
2011-03-15 13:43   ` Bas van Schaik
2011-03-15 14:13     ` Robin Hill
2011-04-01 22:44       ` Bas van Schaik
2011-04-01 23:48         ` Rory Jaffe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).