* data scrubbing
@ 2011-07-29 8:50 Nikolay Kichukov
2011-07-29 10:03 ` Mikael Abrahamsson
2011-07-29 17:17 ` Thomas Harold
0 siblings, 2 replies; 8+ messages in thread
From: Nikolay Kichukov @ 2011-07-29 8:50 UTC (permalink / raw)
To: linux-raid
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Hi all,
Recently on this list it was discussed it is a good practice to perform data scrubbing for some raid levels.
Can someone advise what raid levels need that operation scheduled on a regular basis? Perhaps all raid arrays that have:
/sys/block/md*/md/sync_action
[sync_action] property?
For example is it good for raid1 array?
Cheers,
- -Nik
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
iQEcBAEBAgAGBQJOMnRFAAoJEDFLYVOGGjgX9c8H+wSgfQwiTsE5bjLClmiset2Q
CIBJoqyzVMX8MTLr3yeSEtk2rjG1byKCuc9+Ie7GR0gVx2hW2Hnvb13myOQB1Uww
GH1LI3sTGyet43fPK5JXMwyhBrAiAnh4HMLCSTK3WdWrjfRtaanddDMQDdk4DHVF
wg7xB1NWfsnkOtA0vdgMXQ9Oki1LuBPi9PuZg2Gr4IxdSPm010wDCbJjDRqYBlr4
jE99Elh6oZes+6OImmeMRGz7UJaqC+581/nM/KVMpBEwkOT9jMJKujgRAhLc0pf2
KjjDq6o2/UpIyVTf+EEgdThRL4/PM7g8TaDMBA/pthQKBzoHHJudTa/flzzW6rE=
=WpkM
-----END PGP SIGNATURE-----
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: data scrubbing
2011-07-29 8:50 data scrubbing Nikolay Kichukov
@ 2011-07-29 10:03 ` Mikael Abrahamsson
2011-07-29 13:25 ` Nikolay Kichukov
2011-07-29 17:17 ` Thomas Harold
1 sibling, 1 reply; 8+ messages in thread
From: Mikael Abrahamsson @ 2011-07-29 10:03 UTC (permalink / raw)
To: Nikolay Kichukov; +Cc: linux-raid
On Fri, 29 Jul 2011, Nikolay Kichukov wrote:
> For example is it good for raid1 array?
Yes, it's good for all raid levels that have any kind of redundancy. You
want to read the information on the drives regularily to make sure it can
still be read, and if it can't, it can be recomputed from parity and
written.
Otherwise not-often-read data might have an error on one drive, and then
another drive fails and now when you try to rebuild you don't have this
data anywhere all of a sudden (RAID1 and RAID5), and you had no idea about
this.
Scrubbing is good, do it regularily (at least monthly).
--
Mikael Abrahamsson email: swmike@swm.pp.se
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: data scrubbing
2011-07-29 10:03 ` Mikael Abrahamsson
@ 2011-07-29 13:25 ` Nikolay Kichukov
2011-07-29 20:48 ` Beolach
0 siblings, 1 reply; 8+ messages in thread
From: Nikolay Kichukov @ 2011-07-29 13:25 UTC (permalink / raw)
To: Mikael Abrahamsson; +Cc: linux-raid
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Hi,
This is a good to know!
Just performed a check on a raid1 and got:
Jul 29 15:37:36 hanna64 mdadm[2277]: RebuildFinished event detected on md device /dev/md1, component device mismatches
found: 128
So I presume those mismatches have now been rewritten to both disks successfully. Am I wrong there?
cat /sys/block/md1/md/mismatch_cnt
128
Cheers,
- -Nik
On 07/29/2011 01:03 PM, Mikael Abrahamsson wrote:
> On Fri, 29 Jul 2011, Nikolay Kichukov wrote:
>
>> For example is it good for raid1 array?
>
> Yes, it's good for all raid levels that have any kind of redundancy. You want to read the information on the drives
> regularily to make sure it can still be read, and if it can't, it can be recomputed from parity and written.
>
> Otherwise not-often-read data might have an error on one drive, and then another drive fails and now when you try to
> rebuild you don't have this data anywhere all of a sudden (RAID1 and RAID5), and you had no idea about this.
>
> Scrubbing is good, do it regularily (at least monthly).
>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
iQEcBAEBAgAGBQJOMrTTAAoJEDFLYVOGGjgXVPoH/0WDSWUhR8LvuaSizBBbbN48
iAWWsiA/fJr9DIO9+E1cTFXAqUOxsEY/iAJX7IVKAbS+R3/eYITHj0r6HajG3XnE
wiqY3hoJU79aGBNOtxwAH8QeNtdGooVxL6TW0TRNFr/PFbWiBc2Aj2/aFizuqPHE
EaYd1V02/i0wugWmGAFUAE81qG40jpuwq/B/KL18TDF8aayzj9T1PWLJh2QC3qJZ
ugj708g34+X7yWY7C5gWYjHoX13IbyU+hbaM1Yrt7z0wLBFw+VxtNFDeWvOI/7zn
E1c4DSmb4mAWL/CY8QlKP8oN5EkjS8o3VOz3UckkibiVqJw3X1msYZ52SY3UXeY=
=LfWV
-----END PGP SIGNATURE-----
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: data scrubbing
2011-07-29 8:50 data scrubbing Nikolay Kichukov
2011-07-29 10:03 ` Mikael Abrahamsson
@ 2011-07-29 17:17 ` Thomas Harold
1 sibling, 0 replies; 8+ messages in thread
From: Thomas Harold @ 2011-07-29 17:17 UTC (permalink / raw)
To: Nikolay Kichukov; +Cc: linux-raid
On 7/29/2011 4:50 AM, Nikolay Kichukov wrote:
> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
>
> Hi all,
>
> Recently on this list it was discussed it is a good practice to
> perform data scrubbing for some raid levels. Can someone advise what
> raid levels need that operation scheduled on a regular basis? Perhaps
> all raid arrays that have:
>
> /sys/block/md*/md/sync_action
>
> [sync_action] property?
>
> For example is it good for raid1 array?
>
Yes, we run a script every week (different arrays on different nights)
that looks like:
#!/bin/sh
echo check > /sys/block/md0/md/sync_action
mdadm --wait /dev/md0
cat /sys/block/md0/md/mismatch_cnt
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: data scrubbing
2011-07-29 13:25 ` Nikolay Kichukov
@ 2011-07-29 20:48 ` Beolach
2011-07-29 21:51 ` Mathias Burén
0 siblings, 1 reply; 8+ messages in thread
From: Beolach @ 2011-07-29 20:48 UTC (permalink / raw)
To: Nikolay Kichukov; +Cc: Mdadm
On Fri, Jul 29, 2011 at 07:25, Nikolay Kichukov <hijacker@oldum.net> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Hi,
>
> This is a good to know!
>
> Just performed a check on a raid1 and got:
>
> Jul 29 15:37:36 hanna64 mdadm[2277]: RebuildFinished event detected on md device /dev/md1, component device mismatches
> found: 128
>
> So I presume those mismatches have now been rewritten to both disks successfully. Am I wrong there?
>
> cat /sys/block/md1/md/mismatch_cnt
> 128
>
>
That depends on if you did a "check" or a "repair" - see the SCRUBBING
AND MISMATCHES section of the md(4) man page:
"If check was used, then no action is taken to handle the mismatch,
it is simply recorded. If repair was used, then a mismatch will
be repaired in the same way that resync repairs arrays."
Good luck,
Beolach
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: data scrubbing
2011-07-29 20:48 ` Beolach
@ 2011-07-29 21:51 ` Mathias Burén
2011-07-29 22:16 ` David Brown
2011-07-29 22:37 ` Beolach
0 siblings, 2 replies; 8+ messages in thread
From: Mathias Burén @ 2011-07-29 21:51 UTC (permalink / raw)
To: Beolach; +Cc: Nikolay Kichukov, Mdadm
On 29 July 2011 21:48, Beolach <beolach@gmail.com> wrote:
> On Fri, Jul 29, 2011 at 07:25, Nikolay Kichukov <hijacker@oldum.net> wrote:
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA1
>>
>> Hi,
>>
>> This is a good to know!
>>
>> Just performed a check on a raid1 and got:
>>
>> Jul 29 15:37:36 hanna64 mdadm[2277]: RebuildFinished event detected on md device /dev/md1, component device mismatches
>> found: 128
>>
>> So I presume those mismatches have now been rewritten to both disks successfully. Am I wrong there?
>>
>> cat /sys/block/md1/md/mismatch_cnt
>> 128
>>
>>
>
> That depends on if you did a "check" or a "repair" - see the SCRUBBING
> AND MISMATCHES section of the md(4) man page:
> "If check was used, then no action is taken to handle the mismatch,
> it is simply recorded. If repair was used, then a mismatch will
> be repaired in the same way that resync repairs arrays."
>
>
> Good luck,
> Beolach
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
Sorry to chime in like this. After reading the above, is there a
reason why anyone shouldn't _always_ use repair instead of check on a
weekly RAID6 check? You have to run repair anyway after a check if any
issues are found, right?
Or does the system become vulnerable during a repair? (less redundant)
Thanks,
Mathias
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: data scrubbing
2011-07-29 21:51 ` Mathias Burén
@ 2011-07-29 22:16 ` David Brown
2011-07-29 22:37 ` Beolach
1 sibling, 0 replies; 8+ messages in thread
From: David Brown @ 2011-07-29 22:16 UTC (permalink / raw)
To: linux-raid
On 29/07/11 23:51, Mathias Burén wrote:
> On 29 July 2011 21:48, Beolach<beolach@gmail.com> wrote:
>> On Fri, Jul 29, 2011 at 07:25, Nikolay Kichukov<hijacker@oldum.net> wrote:
>>> -----BEGIN PGP SIGNED MESSAGE-----
>>> Hash: SHA1
>>>
>>> Hi,
>>>
>>> This is a good to know!
>>>
>>> Just performed a check on a raid1 and got:
>>>
>>> Jul 29 15:37:36 hanna64 mdadm[2277]: RebuildFinished event detected on md device /dev/md1, component device mismatches
>>> found: 128
>>>
>>> So I presume those mismatches have now been rewritten to both disks successfully. Am I wrong there?
>>>
>>> cat /sys/block/md1/md/mismatch_cnt
>>> 128
>>>
>>>
>>
>> That depends on if you did a "check" or a "repair" - see the SCRUBBING
>> AND MISMATCHES section of the md(4) man page:
>> "If check was used, then no action is taken to handle the mismatch,
>> it is simply recorded. If repair was used, then a mismatch will
>> be repaired in the same way that resync repairs arrays."
>>
>>
>> Good luck,
>> Beolach
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
>
> Sorry to chime in like this. After reading the above, is there a
> reason why anyone shouldn't _always_ use repair instead of check on a
> weekly RAID6 check? You have to run repair anyway after a check if any
> issues are found, right?
>
> Or does the system become vulnerable during a repair? (less redundant)
>
> Thanks,
> Mathias
If you do a repair, then when a mismatch is found one of the disks is
taken as the "bad" one, and re-created. For raid1, the first copy is
assumed correct. For raid5/6, the data blocks are assumed correct and
the parities re-created. As Neil Brown explained on his blog, without
any more information then this is as good as md raid can do. However,
it is not necessarily as good as /you/ can do. For example, you might
be able to determine which files use the blocks in the mismatched
stripe, and figure out which block was bad. Or for 3-disk raid1 you
could pick the bad block as the odd one out (assuming the other two
matched). For raid6, it's possible to spot if it is a single-disk
mismatch and correct that one disk (for each disk in turn, assume it is
missing and re-create it from the other disks using normal raid6
recovery. If the stripe is then consistent, you've fixed the mismatch).
However, such approaches are not necessarily the correct one. Thus
the "repair" just does the simplest and fastest correction of the
mismatch, and "check" does not change the stripe in case you want to
manually pick a different method.
<http://neil.brown.name/blog/20100211050355>
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: data scrubbing
2011-07-29 21:51 ` Mathias Burén
2011-07-29 22:16 ` David Brown
@ 2011-07-29 22:37 ` Beolach
1 sibling, 0 replies; 8+ messages in thread
From: Beolach @ 2011-07-29 22:37 UTC (permalink / raw)
To: Mathias Burén; +Cc: Mdadm
On Fri, Jul 29, 2011 at 15:51, Mathias Burén <mathias.buren@gmail.com> wrote:
> On 29 July 2011 21:48, Beolach <beolach@gmail.com> wrote:
>> On Fri, Jul 29, 2011 at 07:25, Nikolay Kichukov <hijacker@oldum.net> wrote:
>>> -----BEGIN PGP SIGNED MESSAGE-----
>>> Hash: SHA1
>>>
>>> Hi,
>>>
>>> This is a good to know!
>>>
>>> Just performed a check on a raid1 and got:
>>>
>>> Jul 29 15:37:36 hanna64 mdadm[2277]: RebuildFinished event detected on md device /dev/md1, component device mismatches
>>> found: 128
>>>
>>> So I presume those mismatches have now been rewritten to both disks successfully. Am I wrong there?
>>>
>>> cat /sys/block/md1/md/mismatch_cnt
>>> 128
>>>
>>>
>>
>> That depends on if you did a "check" or a "repair" - see the SCRUBBING
>> AND MISMATCHES section of the md(4) man page:
>> "If check was used, then no action is taken to handle the mismatch,
>> it is simply recorded. If repair was used, then a mismatch will
>> be repaired in the same way that resync repairs arrays."
>>
>>
>> Good luck,
>> Beolach
>
> Sorry to chime in like this. After reading the above, is there a
> reason why anyone shouldn't _always_ use repair instead of check on a
> weekly RAID6 check? You have to run repair anyway after a check if any
> issues are found, right?
>
> Or does the system become vulnerable during a repair? (less redundant)
>
> Thanks,
> Mathias
>
The primary purpose of data scrubbing a RAID is to detect & correct
read errors on any of the member devices; both check and repair
perform this function. Finding (and w/ repair correcting) mismatches
is only a secondary purpose - it is only if there are no read errors
but the data copy or parity blocks are found to be inconsistent that a
mismatch is reported. In order to repair a mismatch, MD needs to
restore consistency, by over writing the inconsistent data copy or
parity blocks w/ the correct data. But, because the underlying member
devices did not return any errors, MD has no way of knowing which
blocks are correct, and which are incorrect; when it is told to do a
repair, it makes the assumption that the first copy in a RAID1 or
RAID10, or the data (non-parity) blocks in RAID4/5/6 are correct, and
corrects the mismatch based on that assumption.
That assumption may or may not be correct, but MD has no way of
determining that reliably - but the user might be able to, by using
additional knowledge or tools, so MD gives the user the option to
perform data scrubbing either with (repair) or without (check) MD
correcting the mismatches using that assumption.
I hope that answers your question,
Beolach
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2011-07-29 22:37 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-07-29 8:50 data scrubbing Nikolay Kichukov
2011-07-29 10:03 ` Mikael Abrahamsson
2011-07-29 13:25 ` Nikolay Kichukov
2011-07-29 20:48 ` Beolach
2011-07-29 21:51 ` Mathias Burén
2011-07-29 22:16 ` David Brown
2011-07-29 22:37 ` Beolach
2011-07-29 17:17 ` Thomas Harold
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).