* Help on first dangerous scrub / suggestions
@ 2009-11-26 12:14 Asdo
2009-11-26 12:22 ` Justin Piszcz
2009-11-26 14:03 ` Mikael Abrahamsson
0 siblings, 2 replies; 13+ messages in thread
From: Asdo @ 2009-11-26 12:14 UTC (permalink / raw)
To: linux-raid
Hi all
we have a server with a 12 disks raid-6.
It has been up for 1 year now but I have never scrubbed it because at
the time I did not know about this good practice (a note on man mdadm
would help).
The array is currently not degraded and has spares.
Now I am scared about initiating the first scrub because if it turns out
that 3 areas in different disks have bad sectors I think am gonna lose
the whole array.
Doing backups now it's also scary because if I hit a bad (uncorrectable)
area in anyone of the disks while reading, a rebuild will start on the
spare and that's like initiating the scrub with all associated risks.
About this point, I would like to suggest a new "mode" of the array,
let's call it "nodegrade" in which no degradation can occur, and I/O in
unreadable areas simply fails with I/O error. By temporarily putting the
array in that mode, at least one could backup without anxiety. I
understand it would not be possible to add a spare / rebuild in this
mode but that's ok.
BTW I would like to ask an info on "readonly" mode mentioned here:
http://www.mjmwired.net/kernel/Documentation/md.txt
upon read error, will it initiate a rebuild / degrade the array or not?
Anyway the "nodegrade" mode I suggest above would be still more useful
because you do not need to put the array in readonly mode, which is
important for doing backups during normal operation.
Coming back to my problem, I have thought that the best approach would
probably be to first collect information on how good are my 12 drives,
and I probably can do that by reading each device like
dd if=/dev/sda of=/dev/null
and see how many of them read with errors. I just hope my 3ware disk
controllers won't disconnect the whole drive upon read error.
(anyone has a better strategy?)
But then if it turns out that 3 of them indeed have unreadable areas I
am screwed anyway. Even with dd_rescue there's no strategy that can save
my data, even if the unreadable areas have different placement in the 3
disks (and that's a case where it should instead be possible to get data
back).
This brings to my second suggestion:
I would like to see 12 (in my case) devices like:
/dev/md0_fromparity/{sda1,sdb1,...} (all readonly)
that behave like this: when reading from /dev/md0_fromparity/sda1 , what
comes out is the bytes that should be in sda1, but computed from the
other disks. Reading from these devices should never degrade an array,
at most give read error.
Why is this useful?
Because one could recover sda1 from a disastered array with multiple
unreadable areas (unless too many are overlapping) in this way:
With the array in "nodegrade" mode and blockdevice marked as readonly:
1- dd_rescue if=/dev/sda1 of=/dev/sdz1 [sdz is a good drive to
eventually take sda place]
take note of failed sectors
2- dd_rescue from /dev/md0_fromparity/sda1 to /dev/sdz1 only for the
sectors that were unreadable from above
3- stop array, take out sda1, and reassemble the array with sdz1 in
place of sda1
... repeat for all the other drives to get a good array back.
What do you think?
I have another question on scrubbing: I am not sure about the exact
behaviour of "check" and "repair":
- will "check" degrade an array if it finds an uncorrectable read-error?
The manual only mentions what happens if the checksums of the parity
disks don't match with data, but that's not what I'm interested in right
now.
- will "repair" .... (same question as above)
Thanks for your comments
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Help on first dangerous scrub / suggestions
2009-11-26 12:14 Help on first dangerous scrub / suggestions Asdo
@ 2009-11-26 12:22 ` Justin Piszcz
2009-11-26 14:06 ` Asdo
2009-11-26 14:03 ` Mikael Abrahamsson
1 sibling, 1 reply; 13+ messages in thread
From: Justin Piszcz @ 2009-11-26 12:22 UTC (permalink / raw)
To: Asdo; +Cc: linux-raid
On Thu, 26 Nov 2009, Asdo wrote:
> Hi all
> we have a server with a 12 disks raid-6.
> It has been up for 1 year now but I have never scrubbed it because at the
> time I did not know about this good practice (a note on man mdadm would
> help).
> The array is currently not degraded and has spares.
>
> Now I am scared about initiating the first scrub because if it turns out that
> 3 areas in different disks have bad sectors I think am gonna lose the whole
> array.
>
> Doing backups now it's also scary because if I hit a bad (uncorrectable) area
> in anyone of the disks while reading, a rebuild will start on the spare and
> that's like initiating the scrub with all associated risks.
>
> About this point, I would like to suggest a new "mode" of the array, let's
> call it "nodegrade" in which no degradation can occur, and I/O in unreadable
> areas simply fails with I/O error. By temporarily putting the array in that
> mode, at least one could backup without anxiety. I understand it would not be
> possible to add a spare / rebuild in this mode but that's ok.
>
> BTW I would like to ask an info on "readonly" mode mentioned here:
> http://www.mjmwired.net/kernel/Documentation/md.txt
> upon read error, will it initiate a rebuild / degrade the array or not?
>
> Anyway the "nodegrade" mode I suggest above would be still more useful
> because you do not need to put the array in readonly mode, which is important
> for doing backups during normal operation.
>
> Coming back to my problem, I have thought that the best approach would
> probably be to first collect information on how good are my 12 drives, and I
> probably can do that by reading each device like
> dd if=/dev/sda of=/dev/null
> and see how many of them read with errors. I just hope my 3ware disk
> controllers won't disconnect the whole drive upon read error.
> (anyone has a better strategy?)
>
> But then if it turns out that 3 of them indeed have unreadable areas I am
> screwed anyway. Even with dd_rescue there's no strategy that can save my
> data, even if the unreadable areas have different placement in the 3 disks
> (and that's a case where it should instead be possible to get data back).
>
> This brings to my second suggestion:
> I would like to see 12 (in my case) devices like:
> /dev/md0_fromparity/{sda1,sdb1,...} (all readonly)
> that behave like this: when reading from /dev/md0_fromparity/sda1 , what
> comes out is the bytes that should be in sda1, but computed from the other
> disks. Reading from these devices should never degrade an array, at most give
> read error.
>
> Why is this useful?
> Because one could recover sda1 from a disastered array with multiple
> unreadable areas (unless too many are overlapping) in this way:
> With the array in "nodegrade" mode and blockdevice marked as readonly:
> 1- dd_rescue if=/dev/sda1 of=/dev/sdz1 [sdz is a good drive to eventually
> take sda place]
> take note of failed sectors
> 2- dd_rescue from /dev/md0_fromparity/sda1 to /dev/sdz1 only for the sectors
> that were unreadable from above
> 3- stop array, take out sda1, and reassemble the array with sdz1 in place of
> sda1
> ... repeat for all the other drives to get a good array back.
>
> What do you think?
>
> I have another question on scrubbing: I am not sure about the exact behaviour
> of "check" and "repair":
> - will "check" degrade an array if it finds an uncorrectable read-error? The
> manual only mentions what happens if the checksums of the parity disks don't
> match with data, but that's not what I'm interested in right now.
> - will "repair" .... (same question as above)
>
> Thanks for your comments
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
Have you gotten any filesystem errors thus far?
How bad are the disks?
Can you show the smartctl -a output of each of the 12 drives?
Can you rsync all of the data to another host?
What filesystem is being used?
If your disks are failing I'd recommend an rsync ASAP over trying to
read/write/test the disks with dd or other tests.
Justin.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Help on first dangerous scrub / suggestions
2009-11-26 12:14 Help on first dangerous scrub / suggestions Asdo
2009-11-26 12:22 ` Justin Piszcz
@ 2009-11-26 14:03 ` Mikael Abrahamsson
2009-11-26 14:13 ` Asdo
1 sibling, 1 reply; 13+ messages in thread
From: Mikael Abrahamsson @ 2009-11-26 14:03 UTC (permalink / raw)
To: Asdo; +Cc: linux-raid
On Thu, 26 Nov 2009, Asdo wrote:
> Now I am scared about initiating the first scrub because if it turns out
> that 3 areas in different disks have bad sectors I think am gonna lose
> the whole array.
What kernel are you using?
As of 2.6.15 or so, sending "repair" (or "resync", I don't remember
exactly) to the md will read all data and if there is bad data, parity
will be used to write to the bad sector (it shouldn't kick the disk).
<http://linux-raid.osdl.org/index.php/RAID_Administration>
--
Mikael Abrahamsson email: swmike@swm.pp.se
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Help on first dangerous scrub / suggestions
2009-11-26 12:22 ` Justin Piszcz
@ 2009-11-26 14:06 ` Asdo
2009-11-26 14:38 ` Justin Piszcz
0 siblings, 1 reply; 13+ messages in thread
From: Asdo @ 2009-11-26 14:06 UTC (permalink / raw)
To: Justin Piszcz; +Cc: linux-raid
Justin Piszcz wrote:
> On Thu, 26 Nov 2009, Asdo wrote:
>
>> Hi all
>> we have a server with a 12 disks raid-6.
>> It has been up for 1 year now but I have never scrubbed it because at
>> the time I did not know about this good practice (a note on man mdadm
>> would help).
>> The array is currently not degraded and has spares.
>>
>> Now I am scared about initiating the first scrub because if it turns
>> out that 3 areas in different disks have bad sectors I think am gonna
>> lose the whole array.
>>
>> Doing backups now it's also scary because if I hit a bad
>> (uncorrectable) area in anyone of the disks while reading, a rebuild
>> will start on the spare and that's like initiating the scrub with all
>> associated risks.
>>
>> About this point, I would like to suggest a new "mode" of the array,
>> let's call it "nodegrade" in which no degradation can occur, and I/O
>> in unreadable areas simply fails with I/O error. By temporarily
>> putting the array in that mode, at least one could backup without
>> anxiety. I understand it would not be possible to add a spare /
>> rebuild in this mode but that's ok.
>>
>> BTW I would like to ask an info on "readonly" mode mentioned here:
>> http://www.mjmwired.net/kernel/Documentation/md.txt
>> upon read error, will it initiate a rebuild / degrade the array or not?
>>
>> Anyway the "nodegrade" mode I suggest above would be still more
>> useful because you do not need to put the array in readonly mode,
>> which is important for doing backups during normal operation.
>>
>> Coming back to my problem, I have thought that the best approach
>> would probably be to first collect information on how good are my 12
>> drives, and I probably can do that by reading each device like
>> dd if=/dev/sda of=/dev/null
>> and see how many of them read with errors. I just hope my 3ware disk
>> controllers won't disconnect the whole drive upon read error.
>> (anyone has a better strategy?)
>>
>> But then if it turns out that 3 of them indeed have unreadable areas
>> I am screwed anyway. Even with dd_rescue there's no strategy that can
>> save my data, even if the unreadable areas have different placement
>> in the 3 disks (and that's a case where it should instead be possible
>> to get data back).
>>
>> This brings to my second suggestion:
>> I would like to see 12 (in my case) devices like:
>> /dev/md0_fromparity/{sda1,sdb1,...} (all readonly)
>> that behave like this: when reading from /dev/md0_fromparity/sda1 ,
>> what comes out is the bytes that should be in sda1, but computed from
>> the other disks. Reading from these devices should never degrade an
>> array, at most give read error.
>>
>> Why is this useful?
>> Because one could recover sda1 from a disastered array with multiple
>> unreadable areas (unless too many are overlapping) in this way:
>> With the array in "nodegrade" mode and blockdevice marked as readonly:
>> 1- dd_rescue if=/dev/sda1 of=/dev/sdz1 [sdz is a good drive to
>> eventually take sda place]
>> take note of failed sectors
>> 2- dd_rescue from /dev/md0_fromparity/sda1 to /dev/sdz1 only for the
>> sectors that were unreadable from above
>> 3- stop array, take out sda1, and reassemble the array with sdz1 in
>> place of sda1
>> ... repeat for all the other drives to get a good array back.
>>
>> What do you think?
>>
>> I have another question on scrubbing: I am not sure about the exact
>> behaviour of "check" and "repair":
>> - will "check" degrade an array if it finds an uncorrectable
>> read-error? The manual only mentions what happens if the checksums of
>> the parity disks don't match with data, but that's not what I'm
>> interested in right now.
>> - will "repair" .... (same question as above)
>>
>> Thanks for your comments
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
>
> Have you gotten any filesystem errors thus far?
> How bad are the disks?
Only one disk gave correctable read errors in dmesg twice (no filesystem
errors), 64 sectors in sequence each time.
Smartctl -a reports indeed those errors on that disk, and no errors on
all the other disks.
(
on the partially-bad disk:
SMART overall-health self-assessment test result: PASSED
...
1 Raw_Read_Error_Rate 0x000f 200 200 051 Pre-fail
Always - 138
...
5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail
Always - 0
the other disks have values: PASSED, 0, 0
)
However I never ran smartctl tests, so the only errors smartctl is aware
of are indeed those I also got from md.
> Can you show the smartctl -a output of each of the 12 drives?
> Can you rsync all of the data to another host?
> What filesystem is being used?
>
> If your disks are failing I'd recommend an rsync ASAP over trying to
> read/write/test the disks with dd or other tests.
Filesystem is ext3
For the rsync I am worried, have you read my original post? If rsync
hits an area with uncorrectable read errors the rebuild will start and
then if turns out there are other 2 partially-unreadable disks I will
lose the array. And I will lose it *right now* and without knowing for
sure before.
What are the drawbacks you see against the dd test I proposed? It's just
to probe to have an idea of how bad is the situation, without changing
the situation yet...
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Help on first dangerous scrub / suggestions
2009-11-26 14:03 ` Mikael Abrahamsson
@ 2009-11-26 14:13 ` Asdo
0 siblings, 0 replies; 13+ messages in thread
From: Asdo @ 2009-11-26 14:13 UTC (permalink / raw)
To: Mikael Abrahamsson; +Cc: linux-raid
Mikael Abrahamsson wrote:
> On Thu, 26 Nov 2009, Asdo wrote:
>
>> Now I am scared about initiating the first scrub because if it turns
>> out that 3 areas in different disks have bad sectors I think am gonna
>> lose the whole array.
>
> What kernel are you using?
>
> As of 2.6.15 or so, sending "repair" (or "resync", I don't remember
> exactly) to the md will read all data and if there is bad data, parity
> will be used to write to the bad sector (it shouldn't kick the disk).
>
> <http://linux-raid.osdl.org/index.php/RAID_Administration>
>
Kernel is ubuntu kernel 2.6.24 .
In the page you are linking I don't see mention of the fact that drives
won't be kicked with a "repair" or "check".
In fact regarding "check" this is written:
'check' just reads everything and doesn't trigger any writes unless a
read error is detected, in which case the normal read-error handing
kicks in.
"normal error handling" seems to suggest that if the read error is
uncorrectable the drive will be kicked. You don't think so?
Thank you
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Help on first dangerous scrub / suggestions
2009-11-26 14:06 ` Asdo
@ 2009-11-26 14:38 ` Justin Piszcz
2009-11-26 19:02 ` Asdo
0 siblings, 1 reply; 13+ messages in thread
From: Justin Piszcz @ 2009-11-26 14:38 UTC (permalink / raw)
To: Asdo; +Cc: linux-raid
On Thu, 26 Nov 2009, Asdo wrote:
>>>
>>> BTW I would like to ask an info on "readonly" mode mentioned here:
>>> http://www.mjmwired.net/kernel/Documentation/md.txt
>>> upon read error, will it initiate a rebuild / degrade the array or not?
This is a good question but it is difficult to test as each use case is
different. That would be a question for Neil.
>>>
>>> Anyway the "nodegrade" mode I suggest above would be still more useful
>>> because you do not need to put the array in readonly mode, which is
>>> important for doing backups during normal operation.
>>>
>>> Coming back to my problem, I have thought that the best approach would
>>> probably be to first collect information on how good are my 12 drives, and
>>> I probably can do that by reading each device like
>>> dd if=/dev/sda of=/dev/null
>>> and see how many of them read with errors. I just hope my 3ware disk
>>> controllers won't disconnect the whole drive upon read error.
>>> (anyone has a better strategy?)
I see where you're going here. Read below but if you go this route I assume
you would first stop the array (?) mdadm -S /dev/mdX and then test each
individual disk one at a time?
>>>
>>> But then if it turns out that 3 of them indeed have unreadable areas I am
>>> screwed anyway. Even with dd_rescue there's no strategy that can save my
>>> data, even if the unreadable areas have different placement in the 3 disks
>>> (and that's a case where it should instead be possible to get data back).
So wouldn't your priority to copy/rsync the *MOST* important data off the
machine first before resorting to more invasive methods?
>>>
>>> This brings to my second suggestion:
>>> I would like to see 12 (in my case) devices like:
>>> /dev/md0_fromparity/{sda1,sdb1,...} (all readonly)
>>> that behave like this: when reading from /dev/md0_fromparity/sda1 , what
>>> comes out is the bytes that should be in sda1, but computed from the other
>>> disks. Reading from these devices should never degrade an array, at most
>>> give read error.
>>>
>>> Why is this useful?
>>> Because one could recover sda1 from a disastered array with multiple
>>> unreadable areas (unless too many are overlapping) in this way:
>>> With the array in "nodegrade" mode and blockdevice marked as readonly:
>>> 1- dd_rescue if=/dev/sda1 of=/dev/sdz1 [sdz is a good drive to
>>> eventually take sda place]
>>> take note of failed sectors
>>> 2- dd_rescue from /dev/md0_fromparity/sda1 to /dev/sdz1 only for the
>>> sectors that were unreadable from above
>>> 3- stop array, take out sda1, and reassemble the array with sdz1 in place
>>> of sda1
>>> ... repeat for all the other drives to get a good array back.
>>>
>>> What do you think?
While this may be possibly, has anyone on this list done something like this
and had it work successfully?
>>>
>>> I have another question on scrubbing: I am not sure about the exact
>>> behaviour of "check" and "repair":
>>> - will "check" degrade an array if it finds an uncorrectable read-error?
From README.checkarray:
'check' is a read-only operation, even though the kernel logs may suggest
otherwise (e.g. /proc/mdstat and several kernel messages will mention
"resync"). Please also see question 21 of the FAQ.
If, however, while reading, a read error occurs, the check will trigger the
normal response to read errors which is to generate the 'correct' data and try
to write that out - so it is possible that a 'check' will trigger a write.
However in the absence of read errors it is read-only.
Per md.txt:
resync - redundancy is being recalculated after unclean
shutdown or creation
repair - A full check and repair is happening. This is
similar to 'resync', but was requested by the
user, and the write-intent bitmap is NOT used to
optimise the process.
check - A full check of redundancy was requested and is
happening. This reads all block and checks
them. A repair may also happen for some raid
levels.
>>> The manual only mentions what happens if the checksums of the parity disks
>>> don't match with data, but that's not what I'm interested in right now.
>>> - will "repair" .... (same question as above)
>>>
>>> Thanks for your comments
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>
>>
>> Have you gotten any filesystem errors thus far?
>> How bad are the disks?
> Only one disk gave correctable read errors in dmesg twice (no filesystem
> errors), 64 sectors in sequence each time.
> Smartctl -a reports indeed those errors on that disk, and no errors on all
> the other disks.
> (
> on the partially-bad disk:
> SMART overall-health self-assessment test result: PASSED
> ...
> 1 Raw_Read_Error_Rate 0x000f 200 200 051 Pre-fail Always
> - 138
> ...
> 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always
> - 0
> the other disks have values: PASSED, 0, 0
> )
> However I never ran smartctl tests, so the only errors smartctl is aware of
> are indeed those I also got from md.
Ouch, in addition if you do not run the 'offline' test mentioned in the
smartctl manpage, (all of the offline-test related statistics) will NOT
be updated, so there is no way to tell how bad the disk really is, the
smartctl statistics for those disks are unknown because they have not been
updated. I had a REAL weird issue once with a mdadm raid-1 where one disk
kept dropping out of the array (two raptor 150s) and I had not run the offline
test and got fed up with it all and put them on a 3ware controller. Shortly
thereafter, I built a new RAID-1 with the same disks and I saw many
re-allocated sectors, the drive was on its way out. However, since I had not
run an offline test before, the disk looked completely FINE, all smart tests
had passed (short,long) && the output from smartctl -a looked good too!
>
>> Can you show the smartctl -a output of each of the 12 drives?
>> Can you rsync all of the data to another host?
>> What filesystem is being used?
>>
>> If your disks are failing I'd recommend an rsync ASAP over trying to
>> read/write/test the disks with dd or other tests.
> Filesystem is ext3
> For the rsync I am worried, have you read my original post? If rsync hits an
> area with uncorrectable read errors the rebuild will start and then if turns
> out there are other 2 partially-unreadable disks I will lose the array. And I
> will lose it *right now* and without knowing for sure before.
Per your other reply, it is plausible that what you are saying may occur. I
have to ask though if you have 12 disks on a 3ware controller, why are you
not using HW RAID-6? Whenever there are read errors on a 3ware controller
it simply remaps the bad sector and marks it as bad, for each sector, and
it does not drop out of the array until there are > 100-300 reallocated
sectors (if these are enterprise drives) (and depending on how the drive
fails of course)..
Aside from that, if your array is say, 50% full and you rsync, you only need
to read what is on the disks and not the entire array (as you would need
to do with the dd). In addition, this would also allow you to rsync your
most important data off at your choosing. If you go ahead with the dd test
and through it you find 3 disks fail during the process, what have you gained?
There is a risk you take either way, your method may bear less risk as long
as no drives completely fail during your read tests. Whereas if you copy or
rsync the data, you may be successful, or not; however in the second scenario
you (hopefully) end up with the data in a second location, to which you can
then run all of the tests you want thereafter.
> What are the drawbacks you see against the dd test I proposed? It's just to
> probe to have an idea of how bad is the situation, without changing the
> situation yet...
Maybe.. As long as the dd test does not brick the drives (unlikely) but it
could happen.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Help on first dangerous scrub / suggestions
2009-11-26 14:38 ` Justin Piszcz
@ 2009-11-26 19:02 ` Asdo
2009-11-26 20:55 ` Justin Piszcz
0 siblings, 1 reply; 13+ messages in thread
From: Asdo @ 2009-11-26 19:02 UTC (permalink / raw)
To: Justin Piszcz; +Cc: linux-raid
Justin Piszcz wrote:
> On Thu, 26 Nov 2009, Asdo wrote:
>
>>>>
>>>> BTW I would like to ask an info on "readonly" mode mentioned here:
>>>> http://www.mjmwired.net/kernel/Documentation/md.txt
>>>> upon read error, will it initiate a rebuild / degrade the array or
>>>> not?
> This is a good question but it is difficult to test as each use case is
> different. That would be a question for Neil.
>
>>>>
>>>> Anyway the "nodegrade" mode I suggest above would be still more
>>>> useful because you do not need to put the array in readonly mode,
>>>> which is important for doing backups during normal operation.
>>>>
>>>> Coming back to my problem, I have thought that the best approach
>>>> would probably be to first collect information on how good are my
>>>> 12 drives, and I probably can do that by reading each device like
>>>> dd if=/dev/sda of=/dev/null
>>>> and see how many of them read with errors. I just hope my 3ware
>>>> disk controllers won't disconnect the whole drive upon read error.
>>>> (anyone has a better strategy?)
> I see where you're going here. Read below but if you go this route I
> assume
> you would first stop the array (?) mdadm -S /dev/mdX and then test each
> individual disk one at a time?
I don't plan to stop the array prior to reading the drives. Reads should
not be harmful...
You think otherwise?
I think I did read the drives in the past on a mounted array and it was
no problem.
>>>>
>>>> But then if it turns out that 3 of them indeed have unreadable
>>>> areas I am screwed anyway. Even with dd_rescue there's no strategy
>>>> that can save my data, even if the unreadable areas have different
>>>> placement in the 3 disks (and that's a case where it should instead
>>>> be possible to get data back).
> So wouldn't your priority to copy/rsync the *MOST* important data off the
> machine first before resorting to more invasive methods?
Yeah I will eventually do that if I find more than 2 drives with read
errors.
(dd read to the individual drives is less invasive than rsync imho)
So You are saying that even if I find less than 2 disks with read errors
(which might even be correctable) with dd reads, you would anyway
proceed with a backup before the scrub?
(Actually I would need to also test the spares for write functionality,
heck...
Oh well... I have many spares...)
I miss so much a "nodegrade" mode as described in my original post :-/
("undegradeable" would probably be more correct btw)
>>>>
>>>> This brings to my second suggestion:
>>>> I would like to see 12 (in my case) devices like:
>>>> /dev/md0_fromparity/{sda1,sdb1,...} (all readonly)
>>>> that behave like this: when reading from /dev/md0_fromparity/sda1 ,
>>>> what comes out is the bytes that should be in sda1, but computed
>>>> from the other disks. Reading from these devices should never
>>>> degrade an array, at most give read error.
>>>>
>>>> Why is this useful?
>>>> Because one could recover sda1 from a disastered array with
>>>> multiple unreadable areas (unless too many are overlapping) in this
>>>> way:
>>>> With the array in "nodegrade" mode and blockdevice marked as readonly:
>>>> 1- dd_rescue if=/dev/sda1 of=/dev/sdz1 [sdz is a good drive to
>>>> eventually take sda place]
>>>> take note of failed sectors
>>>> 2- dd_rescue from /dev/md0_fromparity/sda1 to /dev/sdz1 only for
>>>> the sectors that were unreadable from above
>>>> 3- stop array, take out sda1, and reassemble the array with sdz1 in
>>>> place of sda1
>>>> ... repeat for all the other drives to get a good array back.
>>>>
>>>> What do you think?
> While this may be possibly, has anyone on this list done something
> like this
> and had it work successfully?
Nobody could try this way because the
/dev/md0_fromparity/{sda1,sdb1,...} do not exist. This is a feature
request...
>>>>
>>>> I have another question on scrubbing: I am not sure about the exact
>>>> behaviour of "check" and "repair":
>>>> - will "check" degrade an array if it finds an uncorrectable
>>>> read-error?
>> From README.checkarray:
>
> 'check' is a read-only operation, even though the kernel logs may suggest
> otherwise (e.g. /proc/mdstat and several kernel messages will mention
> "resync"). Please also see question 21 of the FAQ.
>
> If, however, while reading, a read error occurs, the check will
> trigger the
> normal response to read errors which is to generate the 'correct' data
> and try
> to write that out - so it is possible that a 'check' will trigger a
> write.
> However in the absence of read errors it is read-only.
>
> Per md.txt:
>
> resync - redundancy is being recalculated after unclean
> shutdown or creation
>
> repair - A full check and repair is happening. This is
> similar to 'resync', but was requested by the
> user, and the write-intent bitmap is NOT used to
> optimise the process.
>
> check - A full check of redundancy was requested and is
> happening. This reads all block and checks
> them. A repair may also happen for some raid
> levels.
Unfortunately this does not specifically answer the question, even
though the sentence
"If, however, while reading, a read error occurs, the check will trigger
the normal response to read errors..."
seems to suggest that in case of uncorrectable read error the drive will
be kicked.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Help on first dangerous scrub / suggestions
2009-11-26 19:02 ` Asdo
@ 2009-11-26 20:55 ` Justin Piszcz
2009-11-27 13:39 ` Asdo
0 siblings, 1 reply; 13+ messages in thread
From: Justin Piszcz @ 2009-11-26 20:55 UTC (permalink / raw)
To: Asdo; +Cc: linux-raid
On Thu, 26 Nov 2009, Asdo wrote:
> Justin Piszcz wrote:
>> On Thu, 26 Nov 2009, Asdo wrote:
>>
>>>>>
>>>>> BTW I would like to ask an info on "readonly" mode mentioned here:
>>>>> http://www.mjmwired.net/kernel/Documentation/md.txt
>>>>> upon read error, will it initiate a rebuild / degrade the array or not?
>> This is a good question but it is difficult to test as each use case is
>> different. That would be a question for Neil.
>>
>>>>>
>>>>> Anyway the "nodegrade" mode I suggest above would be still more useful
>>>>> because you do not need to put the array in readonly mode, which is
>>>>> important for doing backups during normal operation.
>>>>>
>>>>> Coming back to my problem, I have thought that the best approach would
>>>>> probably be to first collect information on how good are my 12 drives,
>>>>> and I probably can do that by reading each device like
>>>>> dd if=/dev/sda of=/dev/null
>>>>> and see how many of them read with errors. I just hope my 3ware disk
>>>>> controllers won't disconnect the whole drive upon read error.
>>>>> (anyone has a better strategy?)
>> I see where you're going here. Read below but if you go this route I
>> assume
>> you would first stop the array (?) mdadm -S /dev/mdX and then test each
>> individual disk one at a time?
>
> I don't plan to stop the array prior to reading the drives. Reads should not
> be harmful...
> You think otherwise?
Depends on the drives, ever try to copy a file from a failing drive? Some
drives will start to click/reset/ATA errors, etc. Depends on how it is
failing and what is wrong with it.
> I think I did read the drives in the past on a mounted array and it was no
> problem.
Good to hear, let us know which option you choose & what the outcome is!
Justin.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Help on first dangerous scrub / suggestions
2009-11-26 20:55 ` Justin Piszcz
@ 2009-11-27 13:39 ` Asdo
2009-11-27 18:11 ` Asdo
0 siblings, 1 reply; 13+ messages in thread
From: Asdo @ 2009-11-27 13:39 UTC (permalink / raw)
To: Justin Piszcz; +Cc: linux-raid
Justin Piszcz wrote:
>>> I see where you're going here. Read below but if you go this route
>>> I assume
>>> you would first stop the array (?) mdadm -S /dev/mdX and then test each
>>> individual disk one at a time?
>>
>> I don't plan to stop the array prior to reading the drives. Reads
>> should not be harmful...
>> You think otherwise?
> Depends on the drives, ever try to copy a file from a failing drive? Some
> drives will start to click/reset/ATA errors, etc. Depends on how it is
> failing and what is wrong with it.
You are right, good thinking...
The drives are WD RE2, so they do have TLER, however the default retry
time is I think 7 seconds. So if I read a drive with dd and it hits an
unreadable area it will retry for 7 second and during that time it would
not be responsive and any request coming from MD, and I think 7secs is
probably enough for MD to kick the drive out of the array. If the MD
requests are queued in the elevator maybe MD will wait... not sure. I
might help them to queue in elevator by disabling NCQ.
TLER is configurable in theory (so to lower the time to e.g. 1 second)
but I have never done it and it seems I would need to reboot into MSDOS
/ Freedos (wdtler is DOS utility as it seems). The problem is that the
computer is in heavy use and the array also.
This would be another question for Neil (if 7 secs is enough for MD to
kick out the drive, and if the drive will still be kicked if MD requests
to it are queued in the linux elevator).
Without knowing this I will probably opt for your way: rsync data out,
starting from the smallest and most important stuff...
>> I think I did read the drives in the past on a mounted array and it
>> was no problem.
I forgot to mention that in that case there were no read errors... :-D
> Good to hear, let us know which option you choose & what the outcome is!
>
> Justin.
Thank you
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Help on first dangerous scrub / suggestions
2009-11-27 13:39 ` Asdo
@ 2009-11-27 18:11 ` Asdo
2009-11-27 21:08 ` Justin Piszcz
2009-11-27 21:21 ` Neil Brown
0 siblings, 2 replies; 13+ messages in thread
From: Asdo @ 2009-11-27 18:11 UTC (permalink / raw)
To: Justin Piszcz; +Cc: linux-raid
Asdo wrote:
> ....
> Without knowing this I will probably opt for your way: rsync data out,
> starting from the smallest and most important stuff...
> ...
I had another thought:
If I take the computer offline, boot with a livecd (one of the worst
messes here is that the root filesystem is also in that array), run the
raid6 array in READONLY MODE and maybe without spares, then I start a
check (scrub) ...
If drives are kicked I can probably reasseble --force the array and it's
like nothing happened, right?
Since it was mounted readonly I think it would be clean...
Only problem would be if 1 or more drives definitively die during the
procedure, but I hope this is unlikely...
If less than 3 drives die I can still reassemble --force, and take the
data out (at least SOME data, then if it degrades, reassemble again and
try to get out data from another location...)
Do you agree?
I am starting to think that during the procedure for taking the data out
and/or attempt first scrubbing the main problem are write accesses to
the array, because if rebuild starts on a spare and then fails again and
then there were writes in the middle... I think I end up doomed.
Probably even reassemble --force would refuse to work on me. What do you
think?
Thank you
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Help on first dangerous scrub / suggestions
2009-11-27 18:11 ` Asdo
@ 2009-11-27 21:08 ` Justin Piszcz
2009-11-27 21:21 ` Neil Brown
1 sibling, 0 replies; 13+ messages in thread
From: Justin Piszcz @ 2009-11-27 21:08 UTC (permalink / raw)
To: Asdo; +Cc: linux-raid
On Fri, 27 Nov 2009, Asdo wrote:
> Asdo wrote:
>> ....
>> Without knowing this I will probably opt for your way: rsync data out,
>> starting from the smallest and most important stuff...
>> ...
>
> I had another thought:
>
> If I take the computer offline, boot with a livecd (one of the worst messes
> here is that the root filesystem is also in that array), run the raid6 array
> in READONLY MODE and maybe without spares, then I start a check (scrub) ...
>
> If drives are kicked I can probably reasseble --force the array and it's like
> nothing happened, right?
If *more* than 2 drives fail you would need to --force. Also, when you do that
(I have done it before), you need to fsck the filesystem and often many of
your files will end up in /lost+found, depending on how bad it is. But in all
of my tests, the FS was R/W, not R/O, so I am unsure of the outcome, it sounds
like a possibility. With SW/RAID-6 I have lost two disks before and suffered
no problems at all (you can't use Western Digital Velociraptors in RAID), I was
able to copy the data that I needed off of the array without any issues but my
server was not being heavily utilized as yours is.
I think you need to run a smartctl on each of the drives, e.g.:
for disk in /dev/sd[a-z]
do
echo "disk $disk" >> /tmp/a
smartctl -a $disk >> /tmp/a
done
Inspect each disk, is there really a failing disk?..
> Since it was mounted readonly I think it would be clean...
>
> Only problem would be if 1 or more drives definitively die during the
> procedure, but I hope this is unlikely...
> If less than 3 drives die I can still reassemble --force, and take the data
> out (at least SOME data, then if it degrades, reassemble again and try to get
> out data from another location...)
>
> Do you agree?
I agree, but would you not want to just rsync the data off first before going
through all of this?
>
> I am starting to think that during the procedure for taking the data out
> and/or attempt first scrubbing the main problem are write accesses to the
> array, because if rebuild starts on a spare and then fails again and then
> there were writes in the middle... I think I end up doomed. Probably even
> reassemble --force would refuse to work on me. What do you think?
I think you have a point there.. In my opinion:
1. Check each disk, how bad is it, really? (it seems like your array and
disks are fine, one disk may have a re-allocated sector or two,
nothing to worry about) in all seriousness. Do any of the attributes say
*FAILING NOW*?
2. If everything looks OK, copy all of the data off while the system is ONLINE
and WORKING, it will be *MUCH* more difficult trying to extract pieces of
data using dd_rescue and friends vs. just rsyncing the entire array to
another host.
3. Do you not have another host to rsync to? If that is the case then we may
need to approach this problem from a different angle. E.g., making it
read-only after booting from a LiveCD may not be a bad idea, but doing that
BEFORE you rsync'd all the data off is still risking all of the data on the
array, whereas since it is currently up and running you could at least make
a point-in-time copy of all of the data that lives there right now.
Justin.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Help on first dangerous scrub / suggestions
2009-11-27 18:11 ` Asdo
2009-11-27 21:08 ` Justin Piszcz
@ 2009-11-27 21:21 ` Neil Brown
2009-12-02 10:15 ` Asdo
1 sibling, 1 reply; 13+ messages in thread
From: Neil Brown @ 2009-11-27 21:21 UTC (permalink / raw)
To: Asdo; +Cc: Justin Piszcz, linux-raid
On Fri, 27 Nov 2009 19:11:31 +0100
Asdo <asdo@shiftmail.org> wrote:
> Asdo wrote:
> > ....
> > Without knowing this I will probably opt for your way: rsync data out,
> > starting from the smallest and most important stuff...
> > ...
>
> I had another thought:
>
> If I take the computer offline, boot with a livecd (one of the worst
> messes here is that the root filesystem is also in that array), run the
> raid6 array in READONLY MODE and maybe without spares, then I start a
> check (scrub) ...
If the array is marked read-only, it wont do a scrub.
However if you simply don't have any filesystem mounted, then the array
will remain 'clean' and any failures are less likely cause further
failures.
So doing it off line is a good idea, but setting the array to read-only
won't work.
NeilBrown
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Help on first dangerous scrub / suggestions
2009-11-27 21:21 ` Neil Brown
@ 2009-12-02 10:15 ` Asdo
0 siblings, 0 replies; 13+ messages in thread
From: Asdo @ 2009-12-02 10:15 UTC (permalink / raw)
To: Neil Brown; +Cc: linux-raid
Neil Brown wrote:
> If the array is marked read-only, it wont do a scrub.
>
> However if you simply don't have any filesystem mounted, then the array
> will remain 'clean' and any failures are less likely cause further
> failures.
>
> So doing it off line is a good idea, but setting the array to read-only
> won't work.
>
Thanks for the info Neil,
would mounting the filesystems readonly be safe in the same way?
(I might try to rsync some data out during the scrub)
Thank you
Asdo
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2009-12-02 10:15 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-11-26 12:14 Help on first dangerous scrub / suggestions Asdo
2009-11-26 12:22 ` Justin Piszcz
2009-11-26 14:06 ` Asdo
2009-11-26 14:38 ` Justin Piszcz
2009-11-26 19:02 ` Asdo
2009-11-26 20:55 ` Justin Piszcz
2009-11-27 13:39 ` Asdo
2009-11-27 18:11 ` Asdo
2009-11-27 21:08 ` Justin Piszcz
2009-11-27 21:21 ` Neil Brown
2009-12-02 10:15 ` Asdo
2009-11-26 14:03 ` Mikael Abrahamsson
2009-11-26 14:13 ` Asdo
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).