* Doing 'echo repair > /sys/devices/virtual/block/md?/md/sync_action' does not result in mismatch_cnt of 0 on RAID-6? @ 2011-03-15 11:30 Bas van Schaik 2011-03-15 12:13 ` Robin Hill 0 siblings, 1 reply; 6+ messages in thread From: Bas van Schaik @ 2011-03-15 11:30 UTC (permalink / raw) To: linux-raid All, I'm seeing a non-zero mismatch_cnt (in fact, it's 1704) on my RAID-6 array consisting of 8 devices on kernel 2.6.38. After replacing some hardware, I decided to trigger a MD repair by issuing: echo repair > /sys/devices/virtual/block/md5/md/sync_action Directly after issuing this command, the mismatch_cnt is reset to 0 and MD starts checking the array. However, the mismatch_cnt increases during this check - resulting in exactly the same count as seen before. Shouldn't 'repair' yield a zero mismatch_cnt? I think I have seen 'repair' work on other RAID-6 arrays? Furthermore, theoretically it should be possible to indicate which device in the RAID-6 array contains the inconsistent data, or am I mistaking? If so, that would certainly be a nice feature to see implemented, as it would help diagnosing problems. Please let me know your thoughts, as I'm quite keen to get my mismatch_cnt back to 0 in order to see whether the new hardware works properly! Thanks, Bas ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Doing 'echo repair > /sys/devices/virtual/block/md?/md/sync_action' does not result in mismatch_cnt of 0 on RAID-6? 2011-03-15 11:30 Doing 'echo repair > /sys/devices/virtual/block/md?/md/sync_action' does not result in mismatch_cnt of 0 on RAID-6? Bas van Schaik @ 2011-03-15 12:13 ` Robin Hill 2011-03-15 13:43 ` Bas van Schaik 0 siblings, 1 reply; 6+ messages in thread From: Robin Hill @ 2011-03-15 12:13 UTC (permalink / raw) To: linux-raid [-- Attachment #1: Type: text/plain, Size: 1095 bytes --] On Tue Mar 15, 2011 at 11:30:59AM +0000, Bas van Schaik wrote: > All, > > I'm seeing a non-zero mismatch_cnt (in fact, it's 1704) on my RAID-6 > array consisting of 8 devices on kernel 2.6.38. After replacing some > hardware, I decided to trigger a MD repair by issuing: > echo repair > /sys/devices/virtual/block/md5/md/sync_action > > Directly after issuing this command, the mismatch_cnt is reset to 0 and > MD starts checking the array. However, the mismatch_cnt increases during > this check - resulting in exactly the same count as seen before. > Shouldn't 'repair' yield a zero mismatch_cnt? I think I have seen > 'repair' work on other RAID-6 arrays? > The mismatch_cnt is incremented during repair to indicate how many errors were repaired. If you want to be certain though, you'd need to re-run 'check' afterwards. Cheers, Robin -- ___ ( ' } | Robin Hill <robin@robinhill.me.uk> | / / ) | Little Jim says .... | // !! | "He fallen in de water !!" | [-- Attachment #2: Type: application/pgp-signature, Size: 198 bytes --] ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Doing 'echo repair > /sys/devices/virtual/block/md?/md/sync_action' does not result in mismatch_cnt of 0 on RAID-6? 2011-03-15 12:13 ` Robin Hill @ 2011-03-15 13:43 ` Bas van Schaik 2011-03-15 14:13 ` Robin Hill 0 siblings, 1 reply; 6+ messages in thread From: Bas van Schaik @ 2011-03-15 13:43 UTC (permalink / raw) To: linux-raid On 15/03/11 12:13, Robin Hill wrote: > On Tue Mar 15, 2011 at 11:30:59AM +0000, Bas van Schaik wrote: >> All, >> >> I'm seeing a non-zero mismatch_cnt (in fact, it's 1704) on my RAID-6 >> array consisting of 8 devices on kernel 2.6.38. After replacing some >> hardware, I decided to trigger a MD repair by issuing: >> echo repair > /sys/devices/virtual/block/md5/md/sync_action >> >> Directly after issuing this command, the mismatch_cnt is reset to 0 and >> MD starts checking the array. However, the mismatch_cnt increases during >> this check - resulting in exactly the same count as seen before. >> Shouldn't 'repair' yield a zero mismatch_cnt? I think I have seen >> 'repair' work on other RAID-6 arrays? >> > The mismatch_cnt is incremented during repair to indicate how many > errors were repaired. If you want to be certain though, you'd need to > re-run 'check' afterwards. Sorry about that - I was sure the mismatch_cnt was reset after a repair on a different machine, but apparently I was wrong. The 'check' is running right now, I hope you are right! If not, of course I'll let you know. My other question is still standing: > Furthermore, theoretically it should be possible to indicate which > device in the RAID-6 array contains the inconsistent data, or am I > mistaking? If so, that would certainly be a nice feature to see > implemented, as it would help diagnosing problems. Am I indeed correct in thinking this? Thanks, Bas ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Doing 'echo repair > /sys/devices/virtual/block/md?/md/sync_action' does not result in mismatch_cnt of 0 on RAID-6? 2011-03-15 13:43 ` Bas van Schaik @ 2011-03-15 14:13 ` Robin Hill 2011-04-01 22:44 ` Bas van Schaik 0 siblings, 1 reply; 6+ messages in thread From: Robin Hill @ 2011-03-15 14:13 UTC (permalink / raw) To: Bas van Schaik; +Cc: linux-raid [-- Attachment #1: Type: text/plain, Size: 1214 bytes --] On Tue Mar 15, 2011 at 01:43:01PM +0000, Bas van Schaik wrote: > My other question is still standing: > > Furthermore, theoretically it should be possible to indicate which > > device in the RAID-6 array contains the inconsistent data, or am I > > mistaking? If so, that would certainly be a nice feature to see > > implemented, as it would help diagnosing problems. > Am I indeed correct in thinking this? > I'm not sure. If it's a single data block that's failed then you should be able to, for each disk, re-generate the data using the other disks and the P parity, then validate against the Q parity (if it matches then that disk is the incorrect one). You should also be able to detect errors in either the P or Q parity (if one is valid for the data and the other isn't). If there's multiple disks which are incorrect then I don't think there's any way you can tell which (or even avoid having one of the correct disks flagged as incorrect). Cheers, Robin -- ___ ( ' } | Robin Hill <robin@robinhill.me.uk> | / / ) | Little Jim says .... | // !! | "He fallen in de water !!" | [-- Attachment #2: Type: application/pgp-signature, Size: 198 bytes --] ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Doing 'echo repair > /sys/devices/virtual/block/md?/md/sync_action' does not result in mismatch_cnt of 0 on RAID-6? 2011-03-15 14:13 ` Robin Hill @ 2011-04-01 22:44 ` Bas van Schaik 2011-04-01 23:48 ` Rory Jaffe 0 siblings, 1 reply; 6+ messages in thread From: Bas van Schaik @ 2011-04-01 22:44 UTC (permalink / raw) To: linux-raid On 03/15/2011 02:13 PM, Robin Hill wrote: > On Tue Mar 15, 2011 at 01:43:01PM +0000, Bas van Schaik wrote >> My other question is still standing: >>> Furthermore, theoretically it should be possible to indicate which >>> device in the RAID-6 array contains the inconsistent data, or am I >>> mistaking? If so, that would certainly be a nice feature to see >>> implemented, as it would help diagnosing problems. >> Am I indeed correct in thinking this? > I'm not sure. If it's a single data block that's failed then you should > be able to, for each disk, re-generate the data using the other disks > and the P parity, then validate against the Q parity (if it matches then > that disk is the incorrect one). You should also be able to detect > errors in either the P or Q parity (if one is valid for the data and the > other isn't). If there's multiple disks which are incorrect then I > don't think there's any way you can tell which (or even avoid having one > of the correct disks flagged as incorrect). Indeed, that is what I was thinking. As I've just discovered some new block mismatches (that's 2 weeks after the last repair!) on my 8x2TB RAID6 array, it would be really nice to see this feature implemented... I would be happy to contribute, but I am not very experienced in hacking kernel C. Any tips, tricks and/or suggestions anyone? Cheers, Bas ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Doing 'echo repair > /sys/devices/virtual/block/md?/md/sync_action' does not result in mismatch_cnt of 0 on RAID-6? 2011-04-01 22:44 ` Bas van Schaik @ 2011-04-01 23:48 ` Rory Jaffe 0 siblings, 0 replies; 6+ messages in thread From: Rory Jaffe @ 2011-04-01 23:48 UTC (permalink / raw) To: Bas van Schaik; +Cc: linux-raid I had the same question and ended up looking at the source. The kernel documentation was maddeningly vague about this. /drivers/md/raid5.c (which handles both 5 and 6), has, in procedure handle_parity_checks5 and handle_parity_checks6 similar comments: /* handle a successful check operation, if parity is correct * we are done. Otherwise update the mismatch count and repair * parity if !MD_RECOVERY_CHECK */ and the program logic does just that--update the count, then check for the flag, and repair if the flag isn't set. And in /drivers/md/md.c the section that parses the command has the following: if (cmd_match(page, "check")) set_bit(MD_RECOVERY_CHECK, &mddev->recovery); else if (!cmd_match(page, "repair")) return -EINVAL; set_bit(MD_RECOVERY_REQUESTED, &mddev->recovery); set_bit(MD_RECOVERY_SYNC, &mddev->recovery); So it looks like the only difference between check and repair is the MD_RECOVERY_CHECK flag, which is set for check only. On Fri, Apr 1, 2011 at 3:44 PM, Bas van Schaik <bas@tuxes.nl> wrote: > On 03/15/2011 02:13 PM, Robin Hill wrote: >> On Tue Mar 15, 2011 at 01:43:01PM +0000, Bas van Schaik wrote >>> My other question is still standing: >>>> Furthermore, theoretically it should be possible to indicate which >>>> device in the RAID-6 array contains the inconsistent data, or am I >>>> mistaking? If so, that would certainly be a nice feature to see >>>> implemented, as it would help diagnosing problems. >>> Am I indeed correct in thinking this? >> I'm not sure. If it's a single data block that's failed then you should >> be able to, for each disk, re-generate the data using the other disks >> and the P parity, then validate against the Q parity (if it matches then >> that disk is the incorrect one). You should also be able to detect >> errors in either the P or Q parity (if one is valid for the data and the >> other isn't). If there's multiple disks which are incorrect then I >> don't think there's any way you can tell which (or even avoid having one >> of the correct disks flagged as incorrect). > Indeed, that is what I was thinking. As I've just discovered some new > block mismatches (that's 2 weeks after the last repair!) on my 8x2TB > RAID6 array, it would be really nice to see this feature implemented... > I would be happy to contribute, but I am not very experienced in hacking > kernel C. > > Any tips, tricks and/or suggestions anyone? > > Cheers, > > Bas > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2011-04-01 23:48 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2011-03-15 11:30 Doing 'echo repair > /sys/devices/virtual/block/md?/md/sync_action' does not result in mismatch_cnt of 0 on RAID-6? Bas van Schaik 2011-03-15 12:13 ` Robin Hill 2011-03-15 13:43 ` Bas van Schaik 2011-03-15 14:13 ` Robin Hill 2011-04-01 22:44 ` Bas van Schaik 2011-04-01 23:48 ` Rory Jaffe
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).