* Clarifications about check/repair, i.e. RAID SCRUBBING
@ 2006-06-02 8:18 Roy Waldspurger
2006-06-02 11:19 ` Neil Brown
0 siblings, 1 reply; 5+ messages in thread
From: Roy Waldspurger @ 2006-06-02 8:18 UTC (permalink / raw)
To: linux-raid
Hi Neil/folks,
I'm seeking some (hopefully) simple clarifications about the newer raid
checking and scrubbing behavior present in more recent kernels. I must
say that I was more than pleased when I learned about the new
functionality. Kudos, Neil for this addition. Unfortunately, because
this is new it's not to be found in the FAQs or HOW-TOs... with the
exception of the Gentoo "HOWTO Install on Software RAID".
I've looked at the following sources of info:
linux-2.6.16.19/Documentation/md.txt
linux-2.6.16.19/drivers/md:raid5.c and raid6main.c
(the raid5_end_read_request and raid6_end_read_request routines)
emails on the linux-raid mailing list, in particular:
http://lkml.org/lkml/2005/12/4/118
http://www.mail-archive.com/linux-raid@vger.kernel.org/msg04615.html
===============================================================
In any regard:
I'm talking about triggering the following functionality:
echo check > /sys/block/mdX/md/sync_action
echo repair > /sys/block/mdX/md/sync_action
On a RAID5, and soon a RAID6, I'm looking to set up a cron job, and am
trying to figure out what exactly to schedule. The answers to the
following questions might shed some light on this:
1. GENERALLY SPEAKING, WHAT IS THE DIFFERENCE BETWEEN THE "CHECK" AND
"REPAIR" COMMANDS?
The "md.txt" doc mentions for "check" that "a repair may also happen for
some raid levels."
Which RAID levels, and in what cases? If I perform a "check" is there a
cache of bad blocks that need to be fixed that can quickly be repaired
by executing the "repair" command? Or would it go through the entire
array again? I'm working with new drives, and haven't come across any
bad blocks to test this with.
2. CAN "CHECK" BE RUN ON A DEGRADED ARRAY (say with N out of N+1 disks
on a RAID level 5)? I can test this out, but was it designed to do
this, versus "REPAIR" only working on a full set of active drives?
Perhaps "repair" is assuming that I have N+1 disks so that parity can be
WRITTEN?
3. RE: FEEDBACK/LOGGING: it seems that I might see some messages in
dmesg logging output such as "raid5:read error corrected!", is that
right? I realize that "mismatch_count" can also be used to see if there
was any "action" during a "check" or "repair." I'm assuming this stuff
doesn't make its way into an email.
4. DOES "REPAIR" PERFORM READS TO CHECK THE ARRAY, AND THEN WRITE TO THE
ARRAY *ONLY WHEN NECESSARY* TO PERFORM FIXES FOR CERTAIN BLOCKS? (I
know, it's sorta a repeat of question number 1+2).
5. IS THERE ILL-EFFECT TO STOP EITHER "CHECK" OR "REPAIR" BY ISSUING "IDLE"?
6. IS IT AT ALL POSSIBLE TO CHECK A CERTAIN RANGE OF BLOCKS? And to
keep track of which blocks were checked? The motivation is to start
checking some blocks overnight, and to pick-up where I left off the next
night...
7. ANY OTHER CONSIDERATIONS WHEN "SCRUBBING" THE RAID?
Sorry for some of these questions being so similar in nature. I just
want to make sure I understand it correctly.
Neil, again, a BIG thanks for this new functionality. I'm looking
forward to putting a system in place to exercise my drives!
Cheers,
-- roy
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Clarifications about check/repair, i.e. RAID SCRUBBING
2006-06-02 8:18 Clarifications about check/repair, i.e. RAID SCRUBBING Roy Waldspurger
@ 2006-06-02 11:19 ` Neil Brown
2006-06-02 17:46 ` Roy Waldspurger
2006-06-03 23:26 ` Raid5 read error correction log Patrik Jonsson
0 siblings, 2 replies; 5+ messages in thread
From: Neil Brown @ 2006-06-02 11:19 UTC (permalink / raw)
To: Roy Waldspurger; +Cc: linux-raid
On Friday June 2, rlists@frameandfocus.com wrote:
>
> In any regard:
>
> I'm talking about triggering the following functionality:
>
> echo check > /sys/block/mdX/md/sync_action
> echo repair > /sys/block/mdX/md/sync_action
>
> On a RAID5, and soon a RAID6, I'm looking to set up a cron job, and am
> trying to figure out what exactly to schedule. The answers to the
> following questions might shed some light on this:
>
> 1. GENERALLY SPEAKING, WHAT IS THE DIFFERENCE BETWEEN THE "CHECK" AND
> "REPAIR" COMMANDS?
> The "md.txt" doc mentions for "check" that "a repair may also happen for
> some raid levels."
> Which RAID levels, and in what cases? If I perform a "check" is there a
> cache of bad blocks that need to be fixed that can quickly be repaired
> by executing the "repair" command? Or would it go through the entire
> array again? I'm working with new drives, and haven't come across any
> bad blocks to test this with.
'check' just reads everything and doesn't trigger any writes unless a
read error is detected, in which case the normally read-error handing
kicks in. So it can be useful on a read-only array.
'repair' does that same but when it finds an inconsistency is corrects
it by writing something.
If any raid personality had not be taught to specifically understand
'check', then a 'check' run would effect a 'repair'. I think 2.6.17
will have all personalities doing the right thing.
check doesn't keep a record of problems, just a count. 'repair' will
reprocess the whole array.
>
> 2. CAN "CHECK" BE RUN ON A DEGRADED ARRAY (say with N out of N+1 disks
> on a RAID level 5)? I can test this out, but was it designed to do
> this, versus "REPAIR" only working on a full set of active drives?
> Perhaps "repair" is assuming that I have N+1 disks so that parity can be
> WRITTEN?
No, check on a degraded raid5, or a raid6 with 2 missing devices, or a
raid1 with only one device will not do anything. It will terminate
immediately. After all, there is nothing useful that it can do.
>
> 3. RE: FEEDBACK/LOGGING: it seems that I might see some messages in
> dmesg logging output such as "raid5:read error corrected!", is that
> right? I realize that "mismatch_count" can also be used to see if there
> was any "action" during a "check" or "repair." I'm assuming this stuff
> doesn't make its way into an email.
You are correct on all counts. mdadm --monitor doesn't know about
this yet. ((writes notes in mdadm todo list)).
>
> 4. DOES "REPAIR" PERFORM READS TO CHECK THE ARRAY, AND THEN WRITE TO THE
> ARRAY *ONLY WHEN NECESSARY* TO PERFORM FIXES FOR CERTAIN BLOCKS? (I
> know, it's sorta a repeat of question number 1+2).
>
repair only writes when necessary. In the normal case, it will only
read every blocks.
> 5. IS THERE ILL-EFFECT TO STOP EITHER "CHECK" OR "REPAIR" BY ISSUING "IDLE"?
No.
>
> 6. IS IT AT ALL POSSIBLE TO CHECK A CERTAIN RANGE OF BLOCKS? And to
> keep track of which blocks were checked? The motivation is to start
> checking some blocks overnight, and to pick-up where I left off the next
> night...
Not yet. It might be possible one day.
>
> 7. ANY OTHER CONSIDERATIONS WHEN "SCRUBBING" THE RAID?
>
Not that I am aware of.
NeilBrown
> Sorry for some of these questions being so similar in nature. I just
> want to make sure I understand it correctly.
>
> Neil, again, a BIG thanks for this new functionality. I'm looking
> forward to putting a system in place to exercise my drives!
>
> Cheers,
>
> -- roy
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Clarifications about check/repair, i.e. RAID SCRUBBING
2006-06-02 11:19 ` Neil Brown
@ 2006-06-02 17:46 ` Roy Waldspurger
2006-06-03 23:26 ` Raid5 read error correction log Patrik Jonsson
1 sibling, 0 replies; 5+ messages in thread
From: Roy Waldspurger @ 2006-06-02 17:46 UTC (permalink / raw)
To: linux-raid; +Cc: Neil Brown, Matt Hartman
Thanks for clearing things up, Neil. Looks like I will be issuing
weekly "repairs" on most of the arrays.
Cheers,
-- roy
Neil Brown wrote:
> On Friday June 2, rlists@frameandfocus.com wrote:
>
>>In any regard:
>>
>>I'm talking about triggering the following functionality:
>>
>>echo check > /sys/block/mdX/md/sync_action
>>echo repair > /sys/block/mdX/md/sync_action
>>
>>On a RAID5, and soon a RAID6, I'm looking to set up a cron job, and am
>>trying to figure out what exactly to schedule. The answers to the
>>following questions might shed some light on this:
>>
>>1. GENERALLY SPEAKING, WHAT IS THE DIFFERENCE BETWEEN THE "CHECK" AND
>>"REPAIR" COMMANDS?
>>The "md.txt" doc mentions for "check" that "a repair may also happen for
>>some raid levels."
>>Which RAID levels, and in what cases? If I perform a "check" is there a
>>cache of bad blocks that need to be fixed that can quickly be repaired
>>by executing the "repair" command? Or would it go through the entire
>>array again? I'm working with new drives, and haven't come across any
>>bad blocks to test this with.
>
>
> 'check' just reads everything and doesn't trigger any writes unless a
> read error is detected, in which case the normally read-error handing
> kicks in. So it can be useful on a read-only array.
>
> 'repair' does that same but when it finds an inconsistency is corrects
> it by writing something.
> If any raid personality had not be taught to specifically understand
> 'check', then a 'check' run would effect a 'repair'. I think 2.6.17
> will have all personalities doing the right thing.
>
> check doesn't keep a record of problems, just a count. 'repair' will
> reprocess the whole array.
>
>
>
>>2. CAN "CHECK" BE RUN ON A DEGRADED ARRAY (say with N out of N+1 disks
>>on a RAID level 5)? I can test this out, but was it designed to do
>>this, versus "REPAIR" only working on a full set of active drives?
>>Perhaps "repair" is assuming that I have N+1 disks so that parity can be
>>WRITTEN?
>
>
> No, check on a degraded raid5, or a raid6 with 2 missing devices, or a
> raid1 with only one device will not do anything. It will terminate
> immediately. After all, there is nothing useful that it can do.
>
>
>>3. RE: FEEDBACK/LOGGING: it seems that I might see some messages in
>>dmesg logging output such as "raid5:read error corrected!", is that
>>right? I realize that "mismatch_count" can also be used to see if there
>>was any "action" during a "check" or "repair." I'm assuming this stuff
>>doesn't make its way into an email.
>
>
> You are correct on all counts. mdadm --monitor doesn't know about
> this yet. ((writes notes in mdadm todo list)).
>
>
>>4. DOES "REPAIR" PERFORM READS TO CHECK THE ARRAY, AND THEN WRITE TO THE
>>ARRAY *ONLY WHEN NECESSARY* TO PERFORM FIXES FOR CERTAIN BLOCKS? (I
>>know, it's sorta a repeat of question number 1+2).
>>
>
>
> repair only writes when necessary. In the normal case, it will only
> read every blocks.
>
>
>
>>5. IS THERE ILL-EFFECT TO STOP EITHER "CHECK" OR "REPAIR" BY ISSUING "IDLE"?
>
>
> No.
>
>
>>6. IS IT AT ALL POSSIBLE TO CHECK A CERTAIN RANGE OF BLOCKS? And to
>>keep track of which blocks were checked? The motivation is to start
>>checking some blocks overnight, and to pick-up where I left off the next
>>night...
>
>
> Not yet. It might be possible one day.
>
>
>>7. ANY OTHER CONSIDERATIONS WHEN "SCRUBBING" THE RAID?
>>
>
>
> Not that I am aware of.
>
> NeilBrown
>
>
>
>>Sorry for some of these questions being so similar in nature. I just
>>want to make sure I understand it correctly.
>>
>>Neil, again, a BIG thanks for this new functionality. I'm looking
>>forward to putting a system in place to exercise my drives!
>>
>>Cheers,
>>
>>-- roy
>>-
>>To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>>the body of a message to majordomo@vger.kernel.org
>>More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>
>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Raid5 read error correction log
2006-06-02 11:19 ` Neil Brown
2006-06-02 17:46 ` Roy Waldspurger
@ 2006-06-03 23:26 ` Patrik Jonsson
2006-06-04 23:03 ` Neil Brown
1 sibling, 1 reply; 5+ messages in thread
From: Patrik Jonsson @ 2006-06-03 23:26 UTC (permalink / raw)
To: Neil Brown; +Cc: linux-raid
[-- Attachment #1: Type: text/plain, Size: 737 bytes --]
Hey Neil,
It would sure be nice if the log contained any info about the error
correction that's been done rather than simply saying "read error
corrected", like which array chunk, device and sector was corrected. I'm
having a persistent pending sector on a drive, and when I do check or
repair, it says "read error corrected" many times, but I don't know
whether it's doing the same sector over and over or if there are just so
many of them... I seem to remember reading something about this on the
list some time ago, is it already in the kernel? (I'm running 2.6.17-rc4
now).
Btw, when it does correct a read error, I assume it also tries to read
it again to verify that the correction worked?
thanks,
/Patrik
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 250 bytes --]
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Raid5 read error correction log
2006-06-03 23:26 ` Raid5 read error correction log Patrik Jonsson
@ 2006-06-04 23:03 ` Neil Brown
0 siblings, 0 replies; 5+ messages in thread
From: Neil Brown @ 2006-06-04 23:03 UTC (permalink / raw)
To: Patrik Jonsson; +Cc: linux-raid
On Saturday June 3, patrik@ucolick.org wrote:
> Hey Neil,
>
> It would sure be nice if the log contained any info about the error
> correction that's been done rather than simply saying "read error
> corrected", like which array chunk, device and sector was corrected. I'm
> having a persistent pending sector on a drive, and when I do check or
> repair, it says "read error corrected" many times, but I don't know
> whether it's doing the same sector over and over or if there are just so
> many of them... I seem to remember reading something about this on the
> list some time ago, is it already in the kernel? (I'm running 2.6.17-rc4
> now).
Yes.... added to todo list:
include sector/dev info in read-error-corrected messages
>
> Btw, when it does correct a read error, I assume it also tries to read
> it again to verify that the correction worked?
Yes. It doesn't check that the read returns the correct data, but it
does check that a read succeeds. However I'm not certain that the
read request will punch through any cache on the drive. It could be
that the reads return data out of the cache without accessing data on
the surface of the disk....
NeilBrown
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2006-06-04 23:03 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-06-02 8:18 Clarifications about check/repair, i.e. RAID SCRUBBING Roy Waldspurger
2006-06-02 11:19 ` Neil Brown
2006-06-02 17:46 ` Roy Waldspurger
2006-06-03 23:26 ` Raid5 read error correction log Patrik Jonsson
2006-06-04 23:03 ` Neil Brown
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).