Help on first dangerous scrub / suggestions

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Help on first dangerous scrub / suggestions
@ 2009-11-26 12:14 Asdo
  2009-11-26 12:22 ` Justin Piszcz
  2009-11-26 14:03 ` Mikael Abrahamsson
  0 siblings, 2 replies; 13+ messages in thread
From: Asdo @ 2009-11-26 12:14 UTC (permalink / raw)
  To: linux-raid

Hi all
we have a server with a 12 disks raid-6.
It has been up for 1 year now but I have never scrubbed it because at 
the time I did not know about this good practice (a note on man mdadm 
would help).
The array is currently not degraded and has spares.

Now I am scared about initiating the first scrub because if it turns out 
that 3 areas in different disks have bad sectors I think am gonna lose 
the whole array.

Doing backups now it's also scary because if I hit a bad (uncorrectable) 
area in anyone of the disks while reading, a rebuild will start on the 
spare and that's like initiating the scrub with all associated risks.

About this point, I would like to suggest a new "mode" of the array, 
let's call it "nodegrade" in which no degradation can occur, and I/O in 
unreadable areas simply fails with I/O error. By temporarily putting the 
array in that mode, at least one could backup without anxiety. I 
understand it would not be possible to add a spare / rebuild in this 
mode but that's ok.

BTW I would like to ask an info on "readonly" mode mentioned here:
http://www.mjmwired.net/kernel/Documentation/md.txt
upon read error, will it initiate a rebuild / degrade the array or not?

Anyway the "nodegrade" mode I suggest above would be still more useful 
because you do not need to put the array in readonly mode, which is 
important for doing backups during normal operation.

Coming back to my problem, I have thought that the best approach would 
probably be to first collect information on how good are my 12 drives, 
and I probably can do that by reading each device like
  dd if=/dev/sda of=/dev/null
and see how many of them read with errors. I just hope my 3ware disk 
controllers won't disconnect the whole drive upon read error.
(anyone has a better strategy?)

But then if it turns out that 3 of them indeed have unreadable areas I 
am screwed anyway. Even with dd_rescue there's no strategy that can save 
my data, even if the unreadable areas have different placement in the 3 
disks (and that's a case where it should instead be possible to get data 
back).

This brings to my second suggestion:
I would like to see 12 (in my case) devices like:
  /dev/md0_fromparity/{sda1,sdb1,...}   (all readonly)
that behave like this: when reading from /dev/md0_fromparity/sda1 , what 
comes out is the bytes that should be in sda1, but computed from the 
other disks. Reading from these devices should never degrade an array, 
at most give read error.

Why is this useful?
Because one could recover sda1 from a disastered array with multiple 
unreadable areas (unless too many are overlapping) in this way:
With the array in "nodegrade" mode and blockdevice marked as readonly:
  1- dd_rescue if=/dev/sda1 of=/dev/sdz1   [sdz is a good drive to 
eventually take sda place]
     take note of failed sectors
  2- dd_rescue from /dev/md0_fromparity/sda1 to /dev/sdz1 only for the 
sectors that were unreadable from above
  3- stop array, take out sda1, and reassemble the array with sdz1 in 
place of sda1
... repeat for all the other drives to get a good array back.

What do you think?

I have another question on scrubbing: I am not sure about the exact 
behaviour of "check" and "repair":
- will "check" degrade an array if it finds an uncorrectable read-error? 
The manual only mentions what happens if the checksums of the parity 
disks don't match with data, but that's not what I'm interested in right 
now.
- will "repair" .... (same question as above)

Thanks for your comments

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Help on first dangerous scrub / suggestions
  2009-11-26 12:14 Help on first dangerous scrub / suggestions Asdo
@ 2009-11-26 12:22 ` Justin Piszcz
  2009-11-26 14:06   ` Asdo
  2009-11-26 14:03 ` Mikael Abrahamsson
  1 sibling, 1 reply; 13+ messages in thread
From: Justin Piszcz @ 2009-11-26 12:22 UTC (permalink / raw)
  To: Asdo; +Cc: linux-raid



On Thu, 26 Nov 2009, Asdo wrote:

> Hi all
> we have a server with a 12 disks raid-6.
> It has been up for 1 year now but I have never scrubbed it because at the 
> time I did not know about this good practice (a note on man mdadm would 
> help).
> The array is currently not degraded and has spares.
>
> Now I am scared about initiating the first scrub because if it turns out that 
> 3 areas in different disks have bad sectors I think am gonna lose the whole 
> array.
>
> Doing backups now it's also scary because if I hit a bad (uncorrectable) area 
> in anyone of the disks while reading, a rebuild will start on the spare and 
> that's like initiating the scrub with all associated risks.
>
> About this point, I would like to suggest a new "mode" of the array, let's 
> call it "nodegrade" in which no degradation can occur, and I/O in unreadable 
> areas simply fails with I/O error. By temporarily putting the array in that 
> mode, at least one could backup without anxiety. I understand it would not be 
> possible to add a spare / rebuild in this mode but that's ok.
>
> BTW I would like to ask an info on "readonly" mode mentioned here:
> http://www.mjmwired.net/kernel/Documentation/md.txt
> upon read error, will it initiate a rebuild / degrade the array or not?
>
> Anyway the "nodegrade" mode I suggest above would be still more useful 
> because you do not need to put the array in readonly mode, which is important 
> for doing backups during normal operation.
>
> Coming back to my problem, I have thought that the best approach would 
> probably be to first collect information on how good are my 12 drives, and I 
> probably can do that by reading each device like
> dd if=/dev/sda of=/dev/null
> and see how many of them read with errors. I just hope my 3ware disk 
> controllers won't disconnect the whole drive upon read error.
> (anyone has a better strategy?)
>
> But then if it turns out that 3 of them indeed have unreadable areas I am 
> screwed anyway. Even with dd_rescue there's no strategy that can save my 
> data, even if the unreadable areas have different placement in the 3 disks 
> (and that's a case where it should instead be possible to get data back).
>
> This brings to my second suggestion:
> I would like to see 12 (in my case) devices like:
> /dev/md0_fromparity/{sda1,sdb1,...}   (all readonly)
> that behave like this: when reading from /dev/md0_fromparity/sda1 , what 
> comes out is the bytes that should be in sda1, but computed from the other 
> disks. Reading from these devices should never degrade an array, at most give 
> read error.
>
> Why is this useful?
> Because one could recover sda1 from a disastered array with multiple 
> unreadable areas (unless too many are overlapping) in this way:
> With the array in "nodegrade" mode and blockdevice marked as readonly:
> 1- dd_rescue if=/dev/sda1 of=/dev/sdz1   [sdz is a good drive to eventually 
> take sda place]
>    take note of failed sectors
> 2- dd_rescue from /dev/md0_fromparity/sda1 to /dev/sdz1 only for the sectors 
> that were unreadable from above
> 3- stop array, take out sda1, and reassemble the array with sdz1 in place of 
> sda1
> ... repeat for all the other drives to get a good array back.
>
> What do you think?
>
> I have another question on scrubbing: I am not sure about the exact behaviour 
> of "check" and "repair":
> - will "check" degrade an array if it finds an uncorrectable read-error? The 
> manual only mentions what happens if the checksums of the parity disks don't 
> match with data, but that's not what I'm interested in right now.
> - will "repair" .... (same question as above)
>
> Thanks for your comments
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

Have you gotten any filesystem errors thus far?

How bad are the disks?
Can you show the smartctl -a output of each of the 12 drives?
Can you rsync all of the data to another host?
What filesystem is being used?

If your disks are failing I'd recommend an rsync ASAP over trying to 
read/write/test the disks with dd or other tests.

Justin.



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Help on first dangerous scrub / suggestions
  2009-11-26 12:14 Help on first dangerous scrub / suggestions Asdo
  2009-11-26 12:22 ` Justin Piszcz
@ 2009-11-26 14:03 ` Mikael Abrahamsson
  2009-11-26 14:13   ` Asdo
  1 sibling, 1 reply; 13+ messages in thread
From: Mikael Abrahamsson @ 2009-11-26 14:03 UTC (permalink / raw)
  To: Asdo; +Cc: linux-raid

On Thu, 26 Nov 2009, Asdo wrote:

> Now I am scared about initiating the first scrub because if it turns out 
> that 3 areas in different disks have bad sectors I think am gonna lose 
> the whole array.

What kernel are you using?

As of 2.6.15 or so, sending "repair" (or "resync", I don't remember 
exactly) to the md will read all data and if there is bad data, parity 
will be used to write to the bad sector (it shouldn't kick the disk).

<http://linux-raid.osdl.org/index.php/RAID_Administration>

-- 
Mikael Abrahamsson    email: swmike@swm.pp.se

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Help on first dangerous scrub / suggestions
  2009-11-26 12:22 ` Justin Piszcz
@ 2009-11-26 14:06   ` Asdo
  2009-11-26 14:38     ` Justin Piszcz
  0 siblings, 1 reply; 13+ messages in thread
From: Asdo @ 2009-11-26 14:06 UTC (permalink / raw)
  To: Justin Piszcz; +Cc: linux-raid

Justin Piszcz wrote:
> On Thu, 26 Nov 2009, Asdo wrote:
>
>> Hi all
>> we have a server with a 12 disks raid-6.
>> It has been up for 1 year now but I have never scrubbed it because at 
>> the time I did not know about this good practice (a note on man mdadm 
>> would help).
>> The array is currently not degraded and has spares.
>>
>> Now I am scared about initiating the first scrub because if it turns 
>> out that 3 areas in different disks have bad sectors I think am gonna 
>> lose the whole array.
>>
>> Doing backups now it's also scary because if I hit a bad 
>> (uncorrectable) area in anyone of the disks while reading, a rebuild 
>> will start on the spare and that's like initiating the scrub with all 
>> associated risks.
>>
>> About this point, I would like to suggest a new "mode" of the array, 
>> let's call it "nodegrade" in which no degradation can occur, and I/O 
>> in unreadable areas simply fails with I/O error. By temporarily 
>> putting the array in that mode, at least one could backup without 
>> anxiety. I understand it would not be possible to add a spare / 
>> rebuild in this mode but that's ok.
>>
>> BTW I would like to ask an info on "readonly" mode mentioned here:
>> http://www.mjmwired.net/kernel/Documentation/md.txt
>> upon read error, will it initiate a rebuild / degrade the array or not?
>>
>> Anyway the "nodegrade" mode I suggest above would be still more 
>> useful because you do not need to put the array in readonly mode, 
>> which is important for doing backups during normal operation.
>>
>> Coming back to my problem, I have thought that the best approach 
>> would probably be to first collect information on how good are my 12 
>> drives, and I probably can do that by reading each device like
>> dd if=/dev/sda of=/dev/null
>> and see how many of them read with errors. I just hope my 3ware disk 
>> controllers won't disconnect the whole drive upon read error.
>> (anyone has a better strategy?)
>>
>> But then if it turns out that 3 of them indeed have unreadable areas 
>> I am screwed anyway. Even with dd_rescue there's no strategy that can 
>> save my data, even if the unreadable areas have different placement 
>> in the 3 disks (and that's a case where it should instead be possible 
>> to get data back).
>>
>> This brings to my second suggestion:
>> I would like to see 12 (in my case) devices like:
>> /dev/md0_fromparity/{sda1,sdb1,...}   (all readonly)
>> that behave like this: when reading from /dev/md0_fromparity/sda1 , 
>> what comes out is the bytes that should be in sda1, but computed from 
>> the other disks. Reading from these devices should never degrade an 
>> array, at most give read error.
>>
>> Why is this useful?
>> Because one could recover sda1 from a disastered array with multiple 
>> unreadable areas (unless too many are overlapping) in this way:
>> With the array in "nodegrade" mode and blockdevice marked as readonly:
>> 1- dd_rescue if=/dev/sda1 of=/dev/sdz1   [sdz is a good drive to 
>> eventually take sda place]
>>    take note of failed sectors
>> 2- dd_rescue from /dev/md0_fromparity/sda1 to /dev/sdz1 only for the 
>> sectors that were unreadable from above
>> 3- stop array, take out sda1, and reassemble the array with sdz1 in 
>> place of sda1
>> ... repeat for all the other drives to get a good array back.
>>
>> What do you think?
>>
>> I have another question on scrubbing: I am not sure about the exact 
>> behaviour of "check" and "repair":
>> - will "check" degrade an array if it finds an uncorrectable 
>> read-error? The manual only mentions what happens if the checksums of 
>> the parity disks don't match with data, but that's not what I'm 
>> interested in right now.
>> - will "repair" .... (same question as above)
>>
>> Thanks for your comments
>> -- 
>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>
> Have you gotten any filesystem errors thus far?
> How bad are the disks?
Only one disk gave correctable read errors in dmesg twice (no filesystem 
errors), 64 sectors in sequence each time.
Smartctl -a reports indeed those errors on that disk, and no errors on 
all the other disks.
(
on the partially-bad disk:
 SMART overall-health self-assessment test result: PASSED
 ...
 1 Raw_Read_Error_Rate     0x000f   200   200   051    Pre-fail  
Always       -       138
 ...
 5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  
Always       -       0
the other disks have values: PASSED, 0, 0
)
However I never ran smartctl tests, so the only errors smartctl is aware 
of are indeed those I also got from md.

> Can you show the smartctl -a output of each of the 12 drives?
> Can you rsync all of the data to another host?
> What filesystem is being used?
>
> If your disks are failing I'd recommend an rsync ASAP over trying to 
> read/write/test the disks with dd or other tests.
Filesystem is ext3
For the rsync I am worried, have you read my original post? If rsync 
hits an area with uncorrectable read errors the rebuild will start and 
then if turns out there are other 2 partially-unreadable disks I will 
lose the array. And I will lose it *right now* and without knowing for 
sure before.
What are the drawbacks you see against the dd test I proposed? It's just 
to probe to have an idea of how bad is the situation, without changing 
the situation yet...




^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Help on first dangerous scrub / suggestions
  2009-11-26 14:03 ` Mikael Abrahamsson
@ 2009-11-26 14:13   ` Asdo
  0 siblings, 0 replies; 13+ messages in thread
From: Asdo @ 2009-11-26 14:13 UTC (permalink / raw)
  To: Mikael Abrahamsson; +Cc: linux-raid

Mikael Abrahamsson wrote:
> On Thu, 26 Nov 2009, Asdo wrote:
>
>> Now I am scared about initiating the first scrub because if it turns 
>> out that 3 areas in different disks have bad sectors I think am gonna 
>> lose the whole array.
>
> What kernel are you using?
>
> As of 2.6.15 or so, sending "repair" (or "resync", I don't remember 
> exactly) to the md will read all data and if there is bad data, parity 
> will be used to write to the bad sector (it shouldn't kick the disk).
>
> <http://linux-raid.osdl.org/index.php/RAID_Administration>
>
Kernel is ubuntu kernel 2.6.24 .
In the page you are linking I don't see mention of the fact that drives 
won't be kicked with a "repair" or "check".
In fact regarding "check" this is written:
  'check' just reads everything and doesn't trigger any writes unless a 
read error is detected, in which case the normal read-error handing 
kicks in.
"normal error handling" seems to suggest that if the read error is 
uncorrectable the drive will be kicked. You don't think so?

Thank you



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Help on first dangerous scrub / suggestions
  2009-11-26 14:06   ` Asdo
@ 2009-11-26 14:38     ` Justin Piszcz
  2009-11-26 19:02       ` Asdo
  0 siblings, 1 reply; 13+ messages in thread
From: Justin Piszcz @ 2009-11-26 14:38 UTC (permalink / raw)
  To: Asdo; +Cc: linux-raid



On Thu, 26 Nov 2009, Asdo wrote:

>>> 
>>> BTW I would like to ask an info on "readonly" mode mentioned here:
>>> http://www.mjmwired.net/kernel/Documentation/md.txt
>>> upon read error, will it initiate a rebuild / degrade the array or not?
This is a good question but it is difficult to test as each use case is
different. That would be a question for Neil.

>>> 
>>> Anyway the "nodegrade" mode I suggest above would be still more useful 
>>> because you do not need to put the array in readonly mode, which is 
>>> important for doing backups during normal operation.
>>> 
>>> Coming back to my problem, I have thought that the best approach would 
>>> probably be to first collect information on how good are my 12 drives, and 
>>> I probably can do that by reading each device like
>>> dd if=/dev/sda of=/dev/null
>>> and see how many of them read with errors. I just hope my 3ware disk 
>>> controllers won't disconnect the whole drive upon read error.
>>> (anyone has a better strategy?)
I see where you're going here.  Read below but if you go this route I assume
you would first stop the array (?) mdadm -S /dev/mdX and then test each
individual disk one at a time?

>>> 
>>> But then if it turns out that 3 of them indeed have unreadable areas I am 
>>> screwed anyway. Even with dd_rescue there's no strategy that can save my 
>>> data, even if the unreadable areas have different placement in the 3 disks 
>>> (and that's a case where it should instead be possible to get data back).
So wouldn't your priority to copy/rsync the *MOST* important data off the
machine first before resorting to more invasive methods?

>>> 
>>> This brings to my second suggestion:
>>> I would like to see 12 (in my case) devices like:
>>> /dev/md0_fromparity/{sda1,sdb1,...}   (all readonly)
>>> that behave like this: when reading from /dev/md0_fromparity/sda1 , what 
>>> comes out is the bytes that should be in sda1, but computed from the other 
>>> disks. Reading from these devices should never degrade an array, at most 
>>> give read error.
>>> 
>>> Why is this useful?
>>> Because one could recover sda1 from a disastered array with multiple 
>>> unreadable areas (unless too many are overlapping) in this way:
>>> With the array in "nodegrade" mode and blockdevice marked as readonly:
>>> 1- dd_rescue if=/dev/sda1 of=/dev/sdz1   [sdz is a good drive to 
>>> eventually take sda place]
>>>    take note of failed sectors
>>> 2- dd_rescue from /dev/md0_fromparity/sda1 to /dev/sdz1 only for the 
>>> sectors that were unreadable from above
>>> 3- stop array, take out sda1, and reassemble the array with sdz1 in place 
>>> of sda1
>>> ... repeat for all the other drives to get a good array back.
>>> 
>>> What do you think?
While this may be possibly, has anyone on this list done something like this
and had it work successfully?

>>> 
>>> I have another question on scrubbing: I am not sure about the exact 
>>> behaviour of "check" and "repair":
>>> - will "check" degrade an array if it finds an uncorrectable read-error? 
From README.checkarray:

'check' is a read-only operation, even though the kernel logs may suggest
otherwise (e.g. /proc/mdstat and several kernel messages will mention
"resync"). Please also see question 21 of the FAQ.

If, however, while reading, a read error occurs, the check will trigger the
normal response to read errors which is to generate the 'correct' data and try
to write that out - so it is possible that a 'check' will trigger a write.
However in the absence of read errors it is read-only.

Per md.txt:

        resync        - redundancy is being recalculated after unclean
                        shutdown or creation

        repair        - A full check and repair is happening.  This is
                        similar to 'resync', but was requested by the
                        user, and the write-intent bitmap is NOT used to
                        optimise the process.

        check         - A full check of redundancy was requested and is
                        happening.  This reads all block and checks
                        them. A repair may also happen for some raid
                        levels.

>>> The manual only mentions what happens if the checksums of the parity disks 
>>> don't match with data, but that's not what I'm interested in right now.
>>> - will "repair" .... (same question as above)
>>> 
>>> Thanks for your comments
>>> -- 
>>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>> 
>> 
>> Have you gotten any filesystem errors thus far?
>> How bad are the disks?
> Only one disk gave correctable read errors in dmesg twice (no filesystem 
> errors), 64 sectors in sequence each time.
> Smartctl -a reports indeed those errors on that disk, and no errors on all 
> the other disks.
> (
> on the partially-bad disk:
> SMART overall-health self-assessment test result: PASSED
> ...
> 1 Raw_Read_Error_Rate     0x000f   200   200   051    Pre-fail  Always 
> -       138
> ...
> 5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always 
> -       0
> the other disks have values: PASSED, 0, 0
> )
> However I never ran smartctl tests, so the only errors smartctl is aware of 
> are indeed those I also got from md.
Ouch, in addition if you do not run the 'offline' test mentioned in the
smartctl manpage, (all of the offline-test related statistics) will NOT
be updated, so there is no way to tell how bad the disk really is, the
smartctl statistics for those disks are unknown because they have not been
updated.  I had a REAL weird issue once with a mdadm raid-1 where one disk
kept dropping out of the array (two raptor 150s) and I had not run the offline
test and got fed up with it all and put them on a 3ware controller.  Shortly
thereafter, I built a new RAID-1 with the same disks and I saw many
re-allocated sectors, the drive was on its way out.  However, since I had not
run an offline test before, the disk looked completely FINE, all smart tests
had passed (short,long) && the output from smartctl -a looked good too!

>
>> Can you show the smartctl -a output of each of the 12 drives?
>> Can you rsync all of the data to another host?
>> What filesystem is being used?
>> 
>> If your disks are failing I'd recommend an rsync ASAP over trying to 
>> read/write/test the disks with dd or other tests.
> Filesystem is ext3
> For the rsync I am worried, have you read my original post? If rsync hits an 
> area with uncorrectable read errors the rebuild will start and then if turns 
> out there are other 2 partially-unreadable disks I will lose the array. And I 
> will lose it *right now* and without knowing for sure before.
Per your other reply, it is plausible that what you are saying may occur.  I
have to ask though if you have 12 disks on a 3ware controller, why are you
not using HW RAID-6?  Whenever there are read errors on a 3ware controller
it simply remaps the bad sector and marks it as bad, for each sector, and
it does not drop out of the array until there are > 100-300 reallocated
sectors (if these are enterprise drives) (and depending on how the drive
fails of course)..

Aside from that, if your array is say, 50% full and you rsync, you only need
to read what is on the disks and not the entire array (as you would need
to do with the dd).  In addition, this would also allow you to rsync your
most important data off at your choosing.  If you go ahead with the dd test
and through it you find 3 disks fail during the process, what have you gained?

There is a risk you take either way, your method may bear less risk as long
as no drives completely fail during your read tests.  Whereas if you copy or
rsync the data, you may be successful, or not; however in the second scenario
you (hopefully) end up with the data in a second location, to which you can
then run all of the tests you want thereafter.


> What are the drawbacks you see against the dd test I proposed? It's just to 
> probe to have an idea of how bad is the situation, without changing the 
> situation yet...
Maybe.. As long as the dd test does not brick the drives (unlikely) but it 
could happen.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Help on first dangerous scrub / suggestions
  2009-11-26 14:38     ` Justin Piszcz
@ 2009-11-26 19:02       ` Asdo
  2009-11-26 20:55         ` Justin Piszcz
  0 siblings, 1 reply; 13+ messages in thread
From: Asdo @ 2009-11-26 19:02 UTC (permalink / raw)
  To: Justin Piszcz; +Cc: linux-raid

Justin Piszcz wrote:
> On Thu, 26 Nov 2009, Asdo wrote:
>
>>>>
>>>> BTW I would like to ask an info on "readonly" mode mentioned here:
>>>> http://www.mjmwired.net/kernel/Documentation/md.txt
>>>> upon read error, will it initiate a rebuild / degrade the array or 
>>>> not?
> This is a good question but it is difficult to test as each use case is
> different. That would be a question for Neil.
>
>>>>
>>>> Anyway the "nodegrade" mode I suggest above would be still more 
>>>> useful because you do not need to put the array in readonly mode, 
>>>> which is important for doing backups during normal operation.
>>>>
>>>> Coming back to my problem, I have thought that the best approach 
>>>> would probably be to first collect information on how good are my 
>>>> 12 drives, and I probably can do that by reading each device like
>>>> dd if=/dev/sda of=/dev/null
>>>> and see how many of them read with errors. I just hope my 3ware 
>>>> disk controllers won't disconnect the whole drive upon read error.
>>>> (anyone has a better strategy?)
> I see where you're going here.  Read below but if you go this route I 
> assume
> you would first stop the array (?) mdadm -S /dev/mdX and then test each
> individual disk one at a time?

I don't plan to stop the array prior to reading the drives. Reads should 
not be harmful...
You think otherwise?
I think I did read the drives in the past on a mounted array and it was 
no problem.

>>>>
>>>> But then if it turns out that 3 of them indeed have unreadable 
>>>> areas I am screwed anyway. Even with dd_rescue there's no strategy 
>>>> that can save my data, even if the unreadable areas have different 
>>>> placement in the 3 disks (and that's a case where it should instead 
>>>> be possible to get data back).
> So wouldn't your priority to copy/rsync the *MOST* important data off the
> machine first before resorting to more invasive methods?

Yeah I will eventually do that if I find more than 2 drives with read 
errors.
(dd read to the individual drives is less invasive than rsync imho)
So You are saying that even if I find less than 2 disks with read errors 
(which might even be correctable) with dd reads, you would anyway 
proceed with a backup before the scrub?

(Actually I would need to also test the spares for write functionality, 
heck...
Oh well... I have many spares...)

I miss so much a "nodegrade" mode as described in my original post :-/
("undegradeable" would probably be more correct btw)

>>>>
>>>> This brings to my second suggestion:
>>>> I would like to see 12 (in my case) devices like:
>>>> /dev/md0_fromparity/{sda1,sdb1,...}   (all readonly)
>>>> that behave like this: when reading from /dev/md0_fromparity/sda1 , 
>>>> what comes out is the bytes that should be in sda1, but computed 
>>>> from the other disks. Reading from these devices should never 
>>>> degrade an array, at most give read error.
>>>>
>>>> Why is this useful?
>>>> Because one could recover sda1 from a disastered array with 
>>>> multiple unreadable areas (unless too many are overlapping) in this 
>>>> way:
>>>> With the array in "nodegrade" mode and blockdevice marked as readonly:
>>>> 1- dd_rescue if=/dev/sda1 of=/dev/sdz1   [sdz is a good drive to 
>>>> eventually take sda place]
>>>>    take note of failed sectors
>>>> 2- dd_rescue from /dev/md0_fromparity/sda1 to /dev/sdz1 only for 
>>>> the sectors that were unreadable from above
>>>> 3- stop array, take out sda1, and reassemble the array with sdz1 in 
>>>> place of sda1
>>>> ... repeat for all the other drives to get a good array back.
>>>>
>>>> What do you think?
> While this may be possibly, has anyone on this list done something 
> like this
> and had it work successfully?

Nobody could try this way because the 
/dev/md0_fromparity/{sda1,sdb1,...} do not exist. This is a feature 
request...

>>>>
>>>> I have another question on scrubbing: I am not sure about the exact 
>>>> behaviour of "check" and "repair":
>>>> - will "check" degrade an array if it finds an uncorrectable 
>>>> read-error? 
>> From README.checkarray:
>
> 'check' is a read-only operation, even though the kernel logs may suggest
> otherwise (e.g. /proc/mdstat and several kernel messages will mention
> "resync"). Please also see question 21 of the FAQ.
>
> If, however, while reading, a read error occurs, the check will 
> trigger the
> normal response to read errors which is to generate the 'correct' data 
> and try
> to write that out - so it is possible that a 'check' will trigger a 
> write.
> However in the absence of read errors it is read-only.
>
> Per md.txt:
>
>        resync        - redundancy is being recalculated after unclean
>                        shutdown or creation
>
>        repair        - A full check and repair is happening.  This is
>                        similar to 'resync', but was requested by the
>                        user, and the write-intent bitmap is NOT used to
>                        optimise the process.
>
>        check         - A full check of redundancy was requested and is
>                        happening.  This reads all block and checks
>                        them. A repair may also happen for some raid
>                        levels.

Unfortunately this does not specifically answer the question, even 
though the sentence
"If, however, while reading, a read error occurs, the check will trigger 
the normal response to read errors..."
seems to suggest that in case of uncorrectable read error the drive will 
be kicked.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Help on first dangerous scrub / suggestions
  2009-11-26 19:02       ` Asdo
@ 2009-11-26 20:55         ` Justin Piszcz
  2009-11-27 13:39           ` Asdo
  0 siblings, 1 reply; 13+ messages in thread
From: Justin Piszcz @ 2009-11-26 20:55 UTC (permalink / raw)
  To: Asdo; +Cc: linux-raid



On Thu, 26 Nov 2009, Asdo wrote:

> Justin Piszcz wrote:
>> On Thu, 26 Nov 2009, Asdo wrote:
>> 
>>>>> 
>>>>> BTW I would like to ask an info on "readonly" mode mentioned here:
>>>>> http://www.mjmwired.net/kernel/Documentation/md.txt
>>>>> upon read error, will it initiate a rebuild / degrade the array or not?
>> This is a good question but it is difficult to test as each use case is
>> different. That would be a question for Neil.
>> 
>>>>> 
>>>>> Anyway the "nodegrade" mode I suggest above would be still more useful 
>>>>> because you do not need to put the array in readonly mode, which is 
>>>>> important for doing backups during normal operation.
>>>>> 
>>>>> Coming back to my problem, I have thought that the best approach would 
>>>>> probably be to first collect information on how good are my 12 drives, 
>>>>> and I probably can do that by reading each device like
>>>>> dd if=/dev/sda of=/dev/null
>>>>> and see how many of them read with errors. I just hope my 3ware disk 
>>>>> controllers won't disconnect the whole drive upon read error.
>>>>> (anyone has a better strategy?)
>> I see where you're going here.  Read below but if you go this route I 
>> assume
>> you would first stop the array (?) mdadm -S /dev/mdX and then test each
>> individual disk one at a time?
>
> I don't plan to stop the array prior to reading the drives. Reads should not 
> be harmful...
> You think otherwise?
Depends on the drives, ever try to copy a file from a failing drive? Some
drives will start to click/reset/ATA errors, etc.  Depends on how it is
failing and what is wrong with it.

> I think I did read the drives in the past on a mounted array and it was no 
> problem.
Good to hear, let us know which option you choose & what the outcome is!

Justin.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Help on first dangerous scrub / suggestions
  2009-11-26 20:55         ` Justin Piszcz
@ 2009-11-27 13:39           ` Asdo
  2009-11-27 18:11             ` Asdo
  0 siblings, 1 reply; 13+ messages in thread
From: Asdo @ 2009-11-27 13:39 UTC (permalink / raw)
  To: Justin Piszcz; +Cc: linux-raid

Justin Piszcz wrote:
>>> I see where you're going here.  Read below but if you go this route 
>>> I assume
>>> you would first stop the array (?) mdadm -S /dev/mdX and then test each
>>> individual disk one at a time?
>>
>> I don't plan to stop the array prior to reading the drives. Reads 
>> should not be harmful...
>> You think otherwise?
> Depends on the drives, ever try to copy a file from a failing drive? Some
> drives will start to click/reset/ATA errors, etc.  Depends on how it is
> failing and what is wrong with it.

You are right, good thinking...

The drives are WD RE2, so they do have TLER, however the default retry 
time is I think 7 seconds. So if I read a drive with dd and it hits an 
unreadable area it will retry for 7 second and during that time it would 
not be responsive and any request coming from MD, and I think 7secs is 
probably enough for MD to kick the drive out of the array. If the MD 
requests are queued in the elevator maybe MD will wait... not sure. I 
might help them to queue in elevator by disabling NCQ.

TLER is configurable in theory (so to lower the time to e.g. 1 second) 
but I have never done it and it seems I would need to reboot into MSDOS 
/ Freedos (wdtler is DOS utility as it seems). The problem is that the 
computer is in heavy use and the array also.

This would be another question for Neil (if 7 secs is enough for MD to 
kick out the drive, and if the drive will still be kicked if MD requests 
to it are queued in the linux elevator).

Without knowing this I will probably opt for your way: rsync data out, 
starting from the smallest and most important stuff...

>> I think I did read the drives in the past on a mounted array and it 
>> was no problem.
I forgot to mention that in that case there were no read errors... :-D

> Good to hear, let us know which option you choose & what the outcome is!
>
> Justin.

Thank you

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Help on first dangerous scrub / suggestions
  2009-11-27 13:39           ` Asdo
@ 2009-11-27 18:11             ` Asdo
  2009-11-27 21:08               ` Justin Piszcz
  2009-11-27 21:21               ` Neil Brown
  0 siblings, 2 replies; 13+ messages in thread
From: Asdo @ 2009-11-27 18:11 UTC (permalink / raw)
  To: Justin Piszcz; +Cc: linux-raid

Asdo wrote:
> ....
> Without knowing this I will probably opt for your way: rsync data out, 
> starting from the smallest and most important stuff...
> ...

I had another thought:

If I take the computer offline, boot with a livecd (one of the worst 
messes here is that the root filesystem is also in that array), run the 
raid6 array in READONLY MODE and maybe without spares, then I start a 
check (scrub) ...

If drives are kicked I can probably reasseble --force the array and it's 
like nothing happened, right?
Since it was mounted readonly I think it would be clean...

Only problem would be if 1 or more drives definitively die during the 
procedure, but I hope this is unlikely...
If less than 3 drives die I can still reassemble --force, and take the 
data out (at least SOME data, then if it degrades, reassemble again and 
try to get out data from another location...)

Do you agree?

I am starting to think that during the procedure for taking the data out 
and/or attempt first scrubbing the main problem are write accesses to 
the array, because if rebuild starts on a spare and then fails again and 
then there were writes in the middle... I think I end up doomed. 
Probably even reassemble --force would refuse to work on me. What do you 
think?

Thank you

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Help on first dangerous scrub / suggestions
  2009-11-27 18:11             ` Asdo
@ 2009-11-27 21:08               ` Justin Piszcz
  2009-11-27 21:21               ` Neil Brown
  1 sibling, 0 replies; 13+ messages in thread
From: Justin Piszcz @ 2009-11-27 21:08 UTC (permalink / raw)
  To: Asdo; +Cc: linux-raid



On Fri, 27 Nov 2009, Asdo wrote:

> Asdo wrote:
>> ....
>> Without knowing this I will probably opt for your way: rsync data out, 
>> starting from the smallest and most important stuff...
>> ...
>
> I had another thought:
>
> If I take the computer offline, boot with a livecd (one of the worst messes 
> here is that the root filesystem is also in that array), run the raid6 array 
> in READONLY MODE and maybe without spares, then I start a check (scrub) ...

>
> If drives are kicked I can probably reasseble --force the array and it's like 
> nothing happened, right?
If *more* than 2 drives fail you would need to --force.  Also, when you do that
(I have done it before), you need to fsck the filesystem and often many of
your files will end up in /lost+found, depending on how bad it is.  But in all
of my tests, the FS was R/W, not R/O, so I am unsure of the outcome, it sounds
like a possibility.  With SW/RAID-6 I have lost two disks before and suffered
no problems at all (you can't use Western Digital Velociraptors in RAID), I was
able to copy the data that I needed off of the array without any issues but my
server was not being heavily utilized as yours is.

I think you need to run a smartctl on each of the drives, e.g.:

for disk in /dev/sd[a-z]
do
   echo "disk $disk" >> /tmp/a
   smartctl -a $disk >> /tmp/a
done

Inspect each disk, is there really a failing disk?..

> Since it was mounted readonly I think it would be clean...
>
> Only problem would be if 1 or more drives definitively die during the 
> procedure, but I hope this is unlikely...
> If less than 3 drives die I can still reassemble --force, and take the data 
> out (at least SOME data, then if it degrades, reassemble again and try to get 
> out data from another location...)
>
> Do you agree?
I agree, but would you not want to just rsync the data off first before going
through all of this?

>
> I am starting to think that during the procedure for taking the data out 
> and/or attempt first scrubbing the main problem are write accesses to the 
> array, because if rebuild starts on a spare and then fails again and then 
> there were writes in the middle... I think I end up doomed. Probably even 
> reassemble --force would refuse to work on me. What do you think?
I think you have a point there..  In my opinion:

1. Check each disk, how bad is it, really? (it seems like your array and
    disks are fine, one disk may have a re-allocated sector or two,
    nothing to worry about) in all seriousness.  Do any of the attributes say
    *FAILING NOW*?

2. If everything looks OK, copy all of the data off while the system is ONLINE
    and WORKING, it will be *MUCH* more difficult trying to extract pieces of
    data using dd_rescue and friends vs. just rsyncing the entire array to
    another host.

3. Do you not have another host to rsync to?  If that is the case then we may
    need to approach this problem from a different angle.  E.g., making it
    read-only after booting from a LiveCD may not be a bad idea, but doing that
    BEFORE you rsync'd all the data off is still risking all of the data on the
    array, whereas since it is currently up and running you could at least make
    a point-in-time copy of all of the data that lives there right now.

Justin.



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Help on first dangerous scrub / suggestions
  2009-11-27 18:11             ` Asdo
  2009-11-27 21:08               ` Justin Piszcz
@ 2009-11-27 21:21               ` Neil Brown
  2009-12-02 10:15                 ` Asdo
  1 sibling, 1 reply; 13+ messages in thread
From: Neil Brown @ 2009-11-27 21:21 UTC (permalink / raw)
  To: Asdo; +Cc: Justin Piszcz, linux-raid

On Fri, 27 Nov 2009 19:11:31 +0100
Asdo <asdo@shiftmail.org> wrote:

> Asdo wrote:
> > ....
> > Without knowing this I will probably opt for your way: rsync data out, 
> > starting from the smallest and most important stuff...
> > ...
> 
> I had another thought:
> 
> If I take the computer offline, boot with a livecd (one of the worst 
> messes here is that the root filesystem is also in that array), run the 
> raid6 array in READONLY MODE and maybe without spares, then I start a 
> check (scrub) ...

If the array is marked read-only, it wont do a scrub.

However if you simply don't have any filesystem mounted, then the array
will remain 'clean' and any failures are less likely cause further
failures.

So doing it off line is a good idea, but setting the array to read-only
won't work.

NeilBrown

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Help on first dangerous scrub / suggestions
  2009-11-27 21:21               ` Neil Brown
@ 2009-12-02 10:15                 ` Asdo
  0 siblings, 0 replies; 13+ messages in thread
From: Asdo @ 2009-12-02 10:15 UTC (permalink / raw)
  To: Neil Brown; +Cc: linux-raid

Neil Brown wrote:
> If the array is marked read-only, it wont do a scrub.
>
> However if you simply don't have any filesystem mounted, then the array
> will remain 'clean' and any failures are less likely cause further
> failures.
>
> So doing it off line is a good idea, but setting the array to read-only
> won't work.
>   

Thanks for the info Neil,
would mounting the filesystems readonly be safe in the same way?
(I might try to rsync some data out during the scrub)
Thank you
Asdo

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2009-12-02 10:15 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-11-26 12:14 Help on first dangerous scrub / suggestions Asdo
2009-11-26 12:22 ` Justin Piszcz
2009-11-26 14:06   ` Asdo
2009-11-26 14:38     ` Justin Piszcz
2009-11-26 19:02       ` Asdo
2009-11-26 20:55         ` Justin Piszcz
2009-11-27 13:39           ` Asdo
2009-11-27 18:11             ` Asdo
2009-11-27 21:08               ` Justin Piszcz
2009-11-27 21:21               ` Neil Brown
2009-12-02 10:15                 ` Asdo
2009-11-26 14:03 ` Mikael Abrahamsson
2009-11-26 14:13   ` Asdo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).