* data scrubbing @ 2011-07-29 8:50 Nikolay Kichukov 2011-07-29 10:03 ` Mikael Abrahamsson 2011-07-29 17:17 ` Thomas Harold 0 siblings, 2 replies; 8+ messages in thread From: Nikolay Kichukov @ 2011-07-29 8:50 UTC (permalink / raw) To: linux-raid -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi all, Recently on this list it was discussed it is a good practice to perform data scrubbing for some raid levels. Can someone advise what raid levels need that operation scheduled on a regular basis? Perhaps all raid arrays that have: /sys/block/md*/md/sync_action [sync_action] property? For example is it good for raid1 array? Cheers, - -Nik -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQEcBAEBAgAGBQJOMnRFAAoJEDFLYVOGGjgX9c8H+wSgfQwiTsE5bjLClmiset2Q CIBJoqyzVMX8MTLr3yeSEtk2rjG1byKCuc9+Ie7GR0gVx2hW2Hnvb13myOQB1Uww GH1LI3sTGyet43fPK5JXMwyhBrAiAnh4HMLCSTK3WdWrjfRtaanddDMQDdk4DHVF wg7xB1NWfsnkOtA0vdgMXQ9Oki1LuBPi9PuZg2Gr4IxdSPm010wDCbJjDRqYBlr4 jE99Elh6oZes+6OImmeMRGz7UJaqC+581/nM/KVMpBEwkOT9jMJKujgRAhLc0pf2 KjjDq6o2/UpIyVTf+EEgdThRL4/PM7g8TaDMBA/pthQKBzoHHJudTa/flzzW6rE= =WpkM -----END PGP SIGNATURE----- ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: data scrubbing 2011-07-29 8:50 data scrubbing Nikolay Kichukov @ 2011-07-29 10:03 ` Mikael Abrahamsson 2011-07-29 13:25 ` Nikolay Kichukov 2011-07-29 17:17 ` Thomas Harold 1 sibling, 1 reply; 8+ messages in thread From: Mikael Abrahamsson @ 2011-07-29 10:03 UTC (permalink / raw) To: Nikolay Kichukov; +Cc: linux-raid On Fri, 29 Jul 2011, Nikolay Kichukov wrote: > For example is it good for raid1 array? Yes, it's good for all raid levels that have any kind of redundancy. You want to read the information on the drives regularily to make sure it can still be read, and if it can't, it can be recomputed from parity and written. Otherwise not-often-read data might have an error on one drive, and then another drive fails and now when you try to rebuild you don't have this data anywhere all of a sudden (RAID1 and RAID5), and you had no idea about this. Scrubbing is good, do it regularily (at least monthly). -- Mikael Abrahamsson email: swmike@swm.pp.se ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: data scrubbing 2011-07-29 10:03 ` Mikael Abrahamsson @ 2011-07-29 13:25 ` Nikolay Kichukov 2011-07-29 20:48 ` Beolach 0 siblings, 1 reply; 8+ messages in thread From: Nikolay Kichukov @ 2011-07-29 13:25 UTC (permalink / raw) To: Mikael Abrahamsson; +Cc: linux-raid -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi, This is a good to know! Just performed a check on a raid1 and got: Jul 29 15:37:36 hanna64 mdadm[2277]: RebuildFinished event detected on md device /dev/md1, component device mismatches found: 128 So I presume those mismatches have now been rewritten to both disks successfully. Am I wrong there? cat /sys/block/md1/md/mismatch_cnt 128 Cheers, - -Nik On 07/29/2011 01:03 PM, Mikael Abrahamsson wrote: > On Fri, 29 Jul 2011, Nikolay Kichukov wrote: > >> For example is it good for raid1 array? > > Yes, it's good for all raid levels that have any kind of redundancy. You want to read the information on the drives > regularily to make sure it can still be read, and if it can't, it can be recomputed from parity and written. > > Otherwise not-often-read data might have an error on one drive, and then another drive fails and now when you try to > rebuild you don't have this data anywhere all of a sudden (RAID1 and RAID5), and you had no idea about this. > > Scrubbing is good, do it regularily (at least monthly). > -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQEcBAEBAgAGBQJOMrTTAAoJEDFLYVOGGjgXVPoH/0WDSWUhR8LvuaSizBBbbN48 iAWWsiA/fJr9DIO9+E1cTFXAqUOxsEY/iAJX7IVKAbS+R3/eYITHj0r6HajG3XnE wiqY3hoJU79aGBNOtxwAH8QeNtdGooVxL6TW0TRNFr/PFbWiBc2Aj2/aFizuqPHE EaYd1V02/i0wugWmGAFUAE81qG40jpuwq/B/KL18TDF8aayzj9T1PWLJh2QC3qJZ ugj708g34+X7yWY7C5gWYjHoX13IbyU+hbaM1Yrt7z0wLBFw+VxtNFDeWvOI/7zn E1c4DSmb4mAWL/CY8QlKP8oN5EkjS8o3VOz3UckkibiVqJw3X1msYZ52SY3UXeY= =LfWV -----END PGP SIGNATURE----- ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: data scrubbing 2011-07-29 13:25 ` Nikolay Kichukov @ 2011-07-29 20:48 ` Beolach 2011-07-29 21:51 ` Mathias Burén 0 siblings, 1 reply; 8+ messages in thread From: Beolach @ 2011-07-29 20:48 UTC (permalink / raw) To: Nikolay Kichukov; +Cc: Mdadm On Fri, Jul 29, 2011 at 07:25, Nikolay Kichukov <hijacker@oldum.net> wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > Hi, > > This is a good to know! > > Just performed a check on a raid1 and got: > > Jul 29 15:37:36 hanna64 mdadm[2277]: RebuildFinished event detected on md device /dev/md1, component device mismatches > found: 128 > > So I presume those mismatches have now been rewritten to both disks successfully. Am I wrong there? > > cat /sys/block/md1/md/mismatch_cnt > 128 > > That depends on if you did a "check" or a "repair" - see the SCRUBBING AND MISMATCHES section of the md(4) man page: "If check was used, then no action is taken to handle the mismatch, it is simply recorded. If repair was used, then a mismatch will be repaired in the same way that resync repairs arrays." Good luck, Beolach -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: data scrubbing 2011-07-29 20:48 ` Beolach @ 2011-07-29 21:51 ` Mathias Burén 2011-07-29 22:16 ` David Brown 2011-07-29 22:37 ` Beolach 0 siblings, 2 replies; 8+ messages in thread From: Mathias Burén @ 2011-07-29 21:51 UTC (permalink / raw) To: Beolach; +Cc: Nikolay Kichukov, Mdadm On 29 July 2011 21:48, Beolach <beolach@gmail.com> wrote: > On Fri, Jul 29, 2011 at 07:25, Nikolay Kichukov <hijacker@oldum.net> wrote: >> -----BEGIN PGP SIGNED MESSAGE----- >> Hash: SHA1 >> >> Hi, >> >> This is a good to know! >> >> Just performed a check on a raid1 and got: >> >> Jul 29 15:37:36 hanna64 mdadm[2277]: RebuildFinished event detected on md device /dev/md1, component device mismatches >> found: 128 >> >> So I presume those mismatches have now been rewritten to both disks successfully. Am I wrong there? >> >> cat /sys/block/md1/md/mismatch_cnt >> 128 >> >> > > That depends on if you did a "check" or a "repair" - see the SCRUBBING > AND MISMATCHES section of the md(4) man page: > "If check was used, then no action is taken to handle the mismatch, > it is simply recorded. If repair was used, then a mismatch will > be repaired in the same way that resync repairs arrays." > > > Good luck, > Beolach > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Sorry to chime in like this. After reading the above, is there a reason why anyone shouldn't _always_ use repair instead of check on a weekly RAID6 check? You have to run repair anyway after a check if any issues are found, right? Or does the system become vulnerable during a repair? (less redundant) Thanks, Mathias -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: data scrubbing 2011-07-29 21:51 ` Mathias Burén @ 2011-07-29 22:16 ` David Brown 2011-07-29 22:37 ` Beolach 1 sibling, 0 replies; 8+ messages in thread From: David Brown @ 2011-07-29 22:16 UTC (permalink / raw) To: linux-raid On 29/07/11 23:51, Mathias Burén wrote: > On 29 July 2011 21:48, Beolach<beolach@gmail.com> wrote: >> On Fri, Jul 29, 2011 at 07:25, Nikolay Kichukov<hijacker@oldum.net> wrote: >>> -----BEGIN PGP SIGNED MESSAGE----- >>> Hash: SHA1 >>> >>> Hi, >>> >>> This is a good to know! >>> >>> Just performed a check on a raid1 and got: >>> >>> Jul 29 15:37:36 hanna64 mdadm[2277]: RebuildFinished event detected on md device /dev/md1, component device mismatches >>> found: 128 >>> >>> So I presume those mismatches have now been rewritten to both disks successfully. Am I wrong there? >>> >>> cat /sys/block/md1/md/mismatch_cnt >>> 128 >>> >>> >> >> That depends on if you did a "check" or a "repair" - see the SCRUBBING >> AND MISMATCHES section of the md(4) man page: >> "If check was used, then no action is taken to handle the mismatch, >> it is simply recorded. If repair was used, then a mismatch will >> be repaired in the same way that resync repairs arrays." >> >> >> Good luck, >> Beolach >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-raid" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> > > Sorry to chime in like this. After reading the above, is there a > reason why anyone shouldn't _always_ use repair instead of check on a > weekly RAID6 check? You have to run repair anyway after a check if any > issues are found, right? > > Or does the system become vulnerable during a repair? (less redundant) > > Thanks, > Mathias If you do a repair, then when a mismatch is found one of the disks is taken as the "bad" one, and re-created. For raid1, the first copy is assumed correct. For raid5/6, the data blocks are assumed correct and the parities re-created. As Neil Brown explained on his blog, without any more information then this is as good as md raid can do. However, it is not necessarily as good as /you/ can do. For example, you might be able to determine which files use the blocks in the mismatched stripe, and figure out which block was bad. Or for 3-disk raid1 you could pick the bad block as the odd one out (assuming the other two matched). For raid6, it's possible to spot if it is a single-disk mismatch and correct that one disk (for each disk in turn, assume it is missing and re-create it from the other disks using normal raid6 recovery. If the stripe is then consistent, you've fixed the mismatch). However, such approaches are not necessarily the correct one. Thus the "repair" just does the simplest and fastest correction of the mismatch, and "check" does not change the stripe in case you want to manually pick a different method. <http://neil.brown.name/blog/20100211050355> -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: data scrubbing 2011-07-29 21:51 ` Mathias Burén 2011-07-29 22:16 ` David Brown @ 2011-07-29 22:37 ` Beolach 1 sibling, 0 replies; 8+ messages in thread From: Beolach @ 2011-07-29 22:37 UTC (permalink / raw) To: Mathias Burén; +Cc: Mdadm On Fri, Jul 29, 2011 at 15:51, Mathias Burén <mathias.buren@gmail.com> wrote: > On 29 July 2011 21:48, Beolach <beolach@gmail.com> wrote: >> On Fri, Jul 29, 2011 at 07:25, Nikolay Kichukov <hijacker@oldum.net> wrote: >>> -----BEGIN PGP SIGNED MESSAGE----- >>> Hash: SHA1 >>> >>> Hi, >>> >>> This is a good to know! >>> >>> Just performed a check on a raid1 and got: >>> >>> Jul 29 15:37:36 hanna64 mdadm[2277]: RebuildFinished event detected on md device /dev/md1, component device mismatches >>> found: 128 >>> >>> So I presume those mismatches have now been rewritten to both disks successfully. Am I wrong there? >>> >>> cat /sys/block/md1/md/mismatch_cnt >>> 128 >>> >>> >> >> That depends on if you did a "check" or a "repair" - see the SCRUBBING >> AND MISMATCHES section of the md(4) man page: >> "If check was used, then no action is taken to handle the mismatch, >> it is simply recorded. If repair was used, then a mismatch will >> be repaired in the same way that resync repairs arrays." >> >> >> Good luck, >> Beolach > > Sorry to chime in like this. After reading the above, is there a > reason why anyone shouldn't _always_ use repair instead of check on a > weekly RAID6 check? You have to run repair anyway after a check if any > issues are found, right? > > Or does the system become vulnerable during a repair? (less redundant) > > Thanks, > Mathias > The primary purpose of data scrubbing a RAID is to detect & correct read errors on any of the member devices; both check and repair perform this function. Finding (and w/ repair correcting) mismatches is only a secondary purpose - it is only if there are no read errors but the data copy or parity blocks are found to be inconsistent that a mismatch is reported. In order to repair a mismatch, MD needs to restore consistency, by over writing the inconsistent data copy or parity blocks w/ the correct data. But, because the underlying member devices did not return any errors, MD has no way of knowing which blocks are correct, and which are incorrect; when it is told to do a repair, it makes the assumption that the first copy in a RAID1 or RAID10, or the data (non-parity) blocks in RAID4/5/6 are correct, and corrects the mismatch based on that assumption. That assumption may or may not be correct, but MD has no way of determining that reliably - but the user might be able to, by using additional knowledge or tools, so MD gives the user the option to perform data scrubbing either with (repair) or without (check) MD correcting the mismatches using that assumption. I hope that answers your question, Beolach -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: data scrubbing 2011-07-29 8:50 data scrubbing Nikolay Kichukov 2011-07-29 10:03 ` Mikael Abrahamsson @ 2011-07-29 17:17 ` Thomas Harold 1 sibling, 0 replies; 8+ messages in thread From: Thomas Harold @ 2011-07-29 17:17 UTC (permalink / raw) To: Nikolay Kichukov; +Cc: linux-raid On 7/29/2011 4:50 AM, Nikolay Kichukov wrote: > -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 > > Hi all, > > Recently on this list it was discussed it is a good practice to > perform data scrubbing for some raid levels. Can someone advise what > raid levels need that operation scheduled on a regular basis? Perhaps > all raid arrays that have: > > /sys/block/md*/md/sync_action > > [sync_action] property? > > For example is it good for raid1 array? > Yes, we run a script every week (different arrays on different nights) that looks like: #!/bin/sh echo check > /sys/block/md0/md/sync_action mdadm --wait /dev/md0 cat /sys/block/md0/md/mismatch_cnt ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2011-07-29 22:37 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2011-07-29 8:50 data scrubbing Nikolay Kichukov 2011-07-29 10:03 ` Mikael Abrahamsson 2011-07-29 13:25 ` Nikolay Kichukov 2011-07-29 20:48 ` Beolach 2011-07-29 21:51 ` Mathias Burén 2011-07-29 22:16 ` David Brown 2011-07-29 22:37 ` Beolach 2011-07-29 17:17 ` Thomas Harold
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).