* Help on first dangerous scrub / suggestions
@ 2009-11-26 12:14 Asdo
2009-11-26 12:22 ` Justin Piszcz
2009-11-26 14:03 ` Mikael Abrahamsson
0 siblings, 2 replies; 13+ messages in thread
From: Asdo @ 2009-11-26 12:14 UTC (permalink / raw)
To: linux-raid
Hi all
we have a server with a 12 disks raid-6.
It has been up for 1 year now but I have never scrubbed it because at
the time I did not know about this good practice (a note on man mdadm
would help).
The array is currently not degraded and has spares.
Now I am scared about initiating the first scrub because if it turns out
that 3 areas in different disks have bad sectors I think am gonna lose
the whole array.
Doing backups now it's also scary because if I hit a bad (uncorrectable)
area in anyone of the disks while reading, a rebuild will start on the
spare and that's like initiating the scrub with all associated risks.
About this point, I would like to suggest a new "mode" of the array,
let's call it "nodegrade" in which no degradation can occur, and I/O in
unreadable areas simply fails with I/O error. By temporarily putting the
array in that mode, at least one could backup without anxiety. I
understand it would not be possible to add a spare / rebuild in this
mode but that's ok.
BTW I would like to ask an info on "readonly" mode mentioned here:
http://www.mjmwired.net/kernel/Documentation/md.txt
upon read error, will it initiate a rebuild / degrade the array or not?
Anyway the "nodegrade" mode I suggest above would be still more useful
because you do not need to put the array in readonly mode, which is
important for doing backups during normal operation.
Coming back to my problem, I have thought that the best approach would
probably be to first collect information on how good are my 12 drives,
and I probably can do that by reading each device like
dd if=/dev/sda of=/dev/null
and see how many of them read with errors. I just hope my 3ware disk
controllers won't disconnect the whole drive upon read error.
(anyone has a better strategy?)
But then if it turns out that 3 of them indeed have unreadable areas I
am screwed anyway. Even with dd_rescue there's no strategy that can save
my data, even if the unreadable areas have different placement in the 3
disks (and that's a case where it should instead be possible to get data
back).
This brings to my second suggestion:
I would like to see 12 (in my case) devices like:
/dev/md0_fromparity/{sda1,sdb1,...} (all readonly)
that behave like this: when reading from /dev/md0_fromparity/sda1 , what
comes out is the bytes that should be in sda1, but computed from the
other disks. Reading from these devices should never degrade an array,
at most give read error.
Why is this useful?
Because one could recover sda1 from a disastered array with multiple
unreadable areas (unless too many are overlapping) in this way:
With the array in "nodegrade" mode and blockdevice marked as readonly:
1- dd_rescue if=/dev/sda1 of=/dev/sdz1 [sdz is a good drive to
eventually take sda place]
take note of failed sectors
2- dd_rescue from /dev/md0_fromparity/sda1 to /dev/sdz1 only for the
sectors that were unreadable from above
3- stop array, take out sda1, and reassemble the array with sdz1 in
place of sda1
... repeat for all the other drives to get a good array back.
What do you think?
I have another question on scrubbing: I am not sure about the exact
behaviour of "check" and "repair":
- will "check" degrade an array if it finds an uncorrectable read-error?
The manual only mentions what happens if the checksums of the parity
disks don't match with data, but that's not what I'm interested in right
now.
- will "repair" .... (same question as above)
Thanks for your comments
^ permalink raw reply [flat|nested] 13+ messages in thread* Re: Help on first dangerous scrub / suggestions 2009-11-26 12:14 Help on first dangerous scrub / suggestions Asdo @ 2009-11-26 12:22 ` Justin Piszcz 2009-11-26 14:06 ` Asdo 2009-11-26 14:03 ` Mikael Abrahamsson 1 sibling, 1 reply; 13+ messages in thread From: Justin Piszcz @ 2009-11-26 12:22 UTC (permalink / raw) To: Asdo; +Cc: linux-raid On Thu, 26 Nov 2009, Asdo wrote: > Hi all > we have a server with a 12 disks raid-6. > It has been up for 1 year now but I have never scrubbed it because at the > time I did not know about this good practice (a note on man mdadm would > help). > The array is currently not degraded and has spares. > > Now I am scared about initiating the first scrub because if it turns out that > 3 areas in different disks have bad sectors I think am gonna lose the whole > array. > > Doing backups now it's also scary because if I hit a bad (uncorrectable) area > in anyone of the disks while reading, a rebuild will start on the spare and > that's like initiating the scrub with all associated risks. > > About this point, I would like to suggest a new "mode" of the array, let's > call it "nodegrade" in which no degradation can occur, and I/O in unreadable > areas simply fails with I/O error. By temporarily putting the array in that > mode, at least one could backup without anxiety. I understand it would not be > possible to add a spare / rebuild in this mode but that's ok. > > BTW I would like to ask an info on "readonly" mode mentioned here: > http://www.mjmwired.net/kernel/Documentation/md.txt > upon read error, will it initiate a rebuild / degrade the array or not? > > Anyway the "nodegrade" mode I suggest above would be still more useful > because you do not need to put the array in readonly mode, which is important > for doing backups during normal operation. > > Coming back to my problem, I have thought that the best approach would > probably be to first collect information on how good are my 12 drives, and I > probably can do that by reading each device like > dd if=/dev/sda of=/dev/null > and see how many of them read with errors. I just hope my 3ware disk > controllers won't disconnect the whole drive upon read error. > (anyone has a better strategy?) > > But then if it turns out that 3 of them indeed have unreadable areas I am > screwed anyway. Even with dd_rescue there's no strategy that can save my > data, even if the unreadable areas have different placement in the 3 disks > (and that's a case where it should instead be possible to get data back). > > This brings to my second suggestion: > I would like to see 12 (in my case) devices like: > /dev/md0_fromparity/{sda1,sdb1,...} (all readonly) > that behave like this: when reading from /dev/md0_fromparity/sda1 , what > comes out is the bytes that should be in sda1, but computed from the other > disks. Reading from these devices should never degrade an array, at most give > read error. > > Why is this useful? > Because one could recover sda1 from a disastered array with multiple > unreadable areas (unless too many are overlapping) in this way: > With the array in "nodegrade" mode and blockdevice marked as readonly: > 1- dd_rescue if=/dev/sda1 of=/dev/sdz1 [sdz is a good drive to eventually > take sda place] > take note of failed sectors > 2- dd_rescue from /dev/md0_fromparity/sda1 to /dev/sdz1 only for the sectors > that were unreadable from above > 3- stop array, take out sda1, and reassemble the array with sdz1 in place of > sda1 > ... repeat for all the other drives to get a good array back. > > What do you think? > > I have another question on scrubbing: I am not sure about the exact behaviour > of "check" and "repair": > - will "check" degrade an array if it finds an uncorrectable read-error? The > manual only mentions what happens if the checksums of the parity disks don't > match with data, but that's not what I'm interested in right now. > - will "repair" .... (same question as above) > > Thanks for your comments > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Have you gotten any filesystem errors thus far? How bad are the disks? Can you show the smartctl -a output of each of the 12 drives? Can you rsync all of the data to another host? What filesystem is being used? If your disks are failing I'd recommend an rsync ASAP over trying to read/write/test the disks with dd or other tests. Justin. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Help on first dangerous scrub / suggestions 2009-11-26 12:22 ` Justin Piszcz @ 2009-11-26 14:06 ` Asdo 2009-11-26 14:38 ` Justin Piszcz 0 siblings, 1 reply; 13+ messages in thread From: Asdo @ 2009-11-26 14:06 UTC (permalink / raw) To: Justin Piszcz; +Cc: linux-raid Justin Piszcz wrote: > On Thu, 26 Nov 2009, Asdo wrote: > >> Hi all >> we have a server with a 12 disks raid-6. >> It has been up for 1 year now but I have never scrubbed it because at >> the time I did not know about this good practice (a note on man mdadm >> would help). >> The array is currently not degraded and has spares. >> >> Now I am scared about initiating the first scrub because if it turns >> out that 3 areas in different disks have bad sectors I think am gonna >> lose the whole array. >> >> Doing backups now it's also scary because if I hit a bad >> (uncorrectable) area in anyone of the disks while reading, a rebuild >> will start on the spare and that's like initiating the scrub with all >> associated risks. >> >> About this point, I would like to suggest a new "mode" of the array, >> let's call it "nodegrade" in which no degradation can occur, and I/O >> in unreadable areas simply fails with I/O error. By temporarily >> putting the array in that mode, at least one could backup without >> anxiety. I understand it would not be possible to add a spare / >> rebuild in this mode but that's ok. >> >> BTW I would like to ask an info on "readonly" mode mentioned here: >> http://www.mjmwired.net/kernel/Documentation/md.txt >> upon read error, will it initiate a rebuild / degrade the array or not? >> >> Anyway the "nodegrade" mode I suggest above would be still more >> useful because you do not need to put the array in readonly mode, >> which is important for doing backups during normal operation. >> >> Coming back to my problem, I have thought that the best approach >> would probably be to first collect information on how good are my 12 >> drives, and I probably can do that by reading each device like >> dd if=/dev/sda of=/dev/null >> and see how many of them read with errors. I just hope my 3ware disk >> controllers won't disconnect the whole drive upon read error. >> (anyone has a better strategy?) >> >> But then if it turns out that 3 of them indeed have unreadable areas >> I am screwed anyway. Even with dd_rescue there's no strategy that can >> save my data, even if the unreadable areas have different placement >> in the 3 disks (and that's a case where it should instead be possible >> to get data back). >> >> This brings to my second suggestion: >> I would like to see 12 (in my case) devices like: >> /dev/md0_fromparity/{sda1,sdb1,...} (all readonly) >> that behave like this: when reading from /dev/md0_fromparity/sda1 , >> what comes out is the bytes that should be in sda1, but computed from >> the other disks. Reading from these devices should never degrade an >> array, at most give read error. >> >> Why is this useful? >> Because one could recover sda1 from a disastered array with multiple >> unreadable areas (unless too many are overlapping) in this way: >> With the array in "nodegrade" mode and blockdevice marked as readonly: >> 1- dd_rescue if=/dev/sda1 of=/dev/sdz1 [sdz is a good drive to >> eventually take sda place] >> take note of failed sectors >> 2- dd_rescue from /dev/md0_fromparity/sda1 to /dev/sdz1 only for the >> sectors that were unreadable from above >> 3- stop array, take out sda1, and reassemble the array with sdz1 in >> place of sda1 >> ... repeat for all the other drives to get a good array back. >> >> What do you think? >> >> I have another question on scrubbing: I am not sure about the exact >> behaviour of "check" and "repair": >> - will "check" degrade an array if it finds an uncorrectable >> read-error? The manual only mentions what happens if the checksums of >> the parity disks don't match with data, but that's not what I'm >> interested in right now. >> - will "repair" .... (same question as above) >> >> Thanks for your comments >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-raid" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> > > Have you gotten any filesystem errors thus far? > How bad are the disks? Only one disk gave correctable read errors in dmesg twice (no filesystem errors), 64 sectors in sequence each time. Smartctl -a reports indeed those errors on that disk, and no errors on all the other disks. ( on the partially-bad disk: SMART overall-health self-assessment test result: PASSED ... 1 Raw_Read_Error_Rate 0x000f 200 200 051 Pre-fail Always - 138 ... 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0 the other disks have values: PASSED, 0, 0 ) However I never ran smartctl tests, so the only errors smartctl is aware of are indeed those I also got from md. > Can you show the smartctl -a output of each of the 12 drives? > Can you rsync all of the data to another host? > What filesystem is being used? > > If your disks are failing I'd recommend an rsync ASAP over trying to > read/write/test the disks with dd or other tests. Filesystem is ext3 For the rsync I am worried, have you read my original post? If rsync hits an area with uncorrectable read errors the rebuild will start and then if turns out there are other 2 partially-unreadable disks I will lose the array. And I will lose it *right now* and without knowing for sure before. What are the drawbacks you see against the dd test I proposed? It's just to probe to have an idea of how bad is the situation, without changing the situation yet... ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Help on first dangerous scrub / suggestions 2009-11-26 14:06 ` Asdo @ 2009-11-26 14:38 ` Justin Piszcz 2009-11-26 19:02 ` Asdo 0 siblings, 1 reply; 13+ messages in thread From: Justin Piszcz @ 2009-11-26 14:38 UTC (permalink / raw) To: Asdo; +Cc: linux-raid On Thu, 26 Nov 2009, Asdo wrote: >>> >>> BTW I would like to ask an info on "readonly" mode mentioned here: >>> http://www.mjmwired.net/kernel/Documentation/md.txt >>> upon read error, will it initiate a rebuild / degrade the array or not? This is a good question but it is difficult to test as each use case is different. That would be a question for Neil. >>> >>> Anyway the "nodegrade" mode I suggest above would be still more useful >>> because you do not need to put the array in readonly mode, which is >>> important for doing backups during normal operation. >>> >>> Coming back to my problem, I have thought that the best approach would >>> probably be to first collect information on how good are my 12 drives, and >>> I probably can do that by reading each device like >>> dd if=/dev/sda of=/dev/null >>> and see how many of them read with errors. I just hope my 3ware disk >>> controllers won't disconnect the whole drive upon read error. >>> (anyone has a better strategy?) I see where you're going here. Read below but if you go this route I assume you would first stop the array (?) mdadm -S /dev/mdX and then test each individual disk one at a time? >>> >>> But then if it turns out that 3 of them indeed have unreadable areas I am >>> screwed anyway. Even with dd_rescue there's no strategy that can save my >>> data, even if the unreadable areas have different placement in the 3 disks >>> (and that's a case where it should instead be possible to get data back). So wouldn't your priority to copy/rsync the *MOST* important data off the machine first before resorting to more invasive methods? >>> >>> This brings to my second suggestion: >>> I would like to see 12 (in my case) devices like: >>> /dev/md0_fromparity/{sda1,sdb1,...} (all readonly) >>> that behave like this: when reading from /dev/md0_fromparity/sda1 , what >>> comes out is the bytes that should be in sda1, but computed from the other >>> disks. Reading from these devices should never degrade an array, at most >>> give read error. >>> >>> Why is this useful? >>> Because one could recover sda1 from a disastered array with multiple >>> unreadable areas (unless too many are overlapping) in this way: >>> With the array in "nodegrade" mode and blockdevice marked as readonly: >>> 1- dd_rescue if=/dev/sda1 of=/dev/sdz1 [sdz is a good drive to >>> eventually take sda place] >>> take note of failed sectors >>> 2- dd_rescue from /dev/md0_fromparity/sda1 to /dev/sdz1 only for the >>> sectors that were unreadable from above >>> 3- stop array, take out sda1, and reassemble the array with sdz1 in place >>> of sda1 >>> ... repeat for all the other drives to get a good array back. >>> >>> What do you think? While this may be possibly, has anyone on this list done something like this and had it work successfully? >>> >>> I have another question on scrubbing: I am not sure about the exact >>> behaviour of "check" and "repair": >>> - will "check" degrade an array if it finds an uncorrectable read-error? From README.checkarray: 'check' is a read-only operation, even though the kernel logs may suggest otherwise (e.g. /proc/mdstat and several kernel messages will mention "resync"). Please also see question 21 of the FAQ. If, however, while reading, a read error occurs, the check will trigger the normal response to read errors which is to generate the 'correct' data and try to write that out - so it is possible that a 'check' will trigger a write. However in the absence of read errors it is read-only. Per md.txt: resync - redundancy is being recalculated after unclean shutdown or creation repair - A full check and repair is happening. This is similar to 'resync', but was requested by the user, and the write-intent bitmap is NOT used to optimise the process. check - A full check of redundancy was requested and is happening. This reads all block and checks them. A repair may also happen for some raid levels. >>> The manual only mentions what happens if the checksums of the parity disks >>> don't match with data, but that's not what I'm interested in right now. >>> - will "repair" .... (same question as above) >>> >>> Thanks for your comments >>> -- >>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in >>> the body of a message to majordomo@vger.kernel.org >>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>> >> >> Have you gotten any filesystem errors thus far? >> How bad are the disks? > Only one disk gave correctable read errors in dmesg twice (no filesystem > errors), 64 sectors in sequence each time. > Smartctl -a reports indeed those errors on that disk, and no errors on all > the other disks. > ( > on the partially-bad disk: > SMART overall-health self-assessment test result: PASSED > ... > 1 Raw_Read_Error_Rate 0x000f 200 200 051 Pre-fail Always > - 138 > ... > 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always > - 0 > the other disks have values: PASSED, 0, 0 > ) > However I never ran smartctl tests, so the only errors smartctl is aware of > are indeed those I also got from md. Ouch, in addition if you do not run the 'offline' test mentioned in the smartctl manpage, (all of the offline-test related statistics) will NOT be updated, so there is no way to tell how bad the disk really is, the smartctl statistics for those disks are unknown because they have not been updated. I had a REAL weird issue once with a mdadm raid-1 where one disk kept dropping out of the array (two raptor 150s) and I had not run the offline test and got fed up with it all and put them on a 3ware controller. Shortly thereafter, I built a new RAID-1 with the same disks and I saw many re-allocated sectors, the drive was on its way out. However, since I had not run an offline test before, the disk looked completely FINE, all smart tests had passed (short,long) && the output from smartctl -a looked good too! > >> Can you show the smartctl -a output of each of the 12 drives? >> Can you rsync all of the data to another host? >> What filesystem is being used? >> >> If your disks are failing I'd recommend an rsync ASAP over trying to >> read/write/test the disks with dd or other tests. > Filesystem is ext3 > For the rsync I am worried, have you read my original post? If rsync hits an > area with uncorrectable read errors the rebuild will start and then if turns > out there are other 2 partially-unreadable disks I will lose the array. And I > will lose it *right now* and without knowing for sure before. Per your other reply, it is plausible that what you are saying may occur. I have to ask though if you have 12 disks on a 3ware controller, why are you not using HW RAID-6? Whenever there are read errors on a 3ware controller it simply remaps the bad sector and marks it as bad, for each sector, and it does not drop out of the array until there are > 100-300 reallocated sectors (if these are enterprise drives) (and depending on how the drive fails of course).. Aside from that, if your array is say, 50% full and you rsync, you only need to read what is on the disks and not the entire array (as you would need to do with the dd). In addition, this would also allow you to rsync your most important data off at your choosing. If you go ahead with the dd test and through it you find 3 disks fail during the process, what have you gained? There is a risk you take either way, your method may bear less risk as long as no drives completely fail during your read tests. Whereas if you copy or rsync the data, you may be successful, or not; however in the second scenario you (hopefully) end up with the data in a second location, to which you can then run all of the tests you want thereafter. > What are the drawbacks you see against the dd test I proposed? It's just to > probe to have an idea of how bad is the situation, without changing the > situation yet... Maybe.. As long as the dd test does not brick the drives (unlikely) but it could happen. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Help on first dangerous scrub / suggestions 2009-11-26 14:38 ` Justin Piszcz @ 2009-11-26 19:02 ` Asdo 2009-11-26 20:55 ` Justin Piszcz 0 siblings, 1 reply; 13+ messages in thread From: Asdo @ 2009-11-26 19:02 UTC (permalink / raw) To: Justin Piszcz; +Cc: linux-raid Justin Piszcz wrote: > On Thu, 26 Nov 2009, Asdo wrote: > >>>> >>>> BTW I would like to ask an info on "readonly" mode mentioned here: >>>> http://www.mjmwired.net/kernel/Documentation/md.txt >>>> upon read error, will it initiate a rebuild / degrade the array or >>>> not? > This is a good question but it is difficult to test as each use case is > different. That would be a question for Neil. > >>>> >>>> Anyway the "nodegrade" mode I suggest above would be still more >>>> useful because you do not need to put the array in readonly mode, >>>> which is important for doing backups during normal operation. >>>> >>>> Coming back to my problem, I have thought that the best approach >>>> would probably be to first collect information on how good are my >>>> 12 drives, and I probably can do that by reading each device like >>>> dd if=/dev/sda of=/dev/null >>>> and see how many of them read with errors. I just hope my 3ware >>>> disk controllers won't disconnect the whole drive upon read error. >>>> (anyone has a better strategy?) > I see where you're going here. Read below but if you go this route I > assume > you would first stop the array (?) mdadm -S /dev/mdX and then test each > individual disk one at a time? I don't plan to stop the array prior to reading the drives. Reads should not be harmful... You think otherwise? I think I did read the drives in the past on a mounted array and it was no problem. >>>> >>>> But then if it turns out that 3 of them indeed have unreadable >>>> areas I am screwed anyway. Even with dd_rescue there's no strategy >>>> that can save my data, even if the unreadable areas have different >>>> placement in the 3 disks (and that's a case where it should instead >>>> be possible to get data back). > So wouldn't your priority to copy/rsync the *MOST* important data off the > machine first before resorting to more invasive methods? Yeah I will eventually do that if I find more than 2 drives with read errors. (dd read to the individual drives is less invasive than rsync imho) So You are saying that even if I find less than 2 disks with read errors (which might even be correctable) with dd reads, you would anyway proceed with a backup before the scrub? (Actually I would need to also test the spares for write functionality, heck... Oh well... I have many spares...) I miss so much a "nodegrade" mode as described in my original post :-/ ("undegradeable" would probably be more correct btw) >>>> >>>> This brings to my second suggestion: >>>> I would like to see 12 (in my case) devices like: >>>> /dev/md0_fromparity/{sda1,sdb1,...} (all readonly) >>>> that behave like this: when reading from /dev/md0_fromparity/sda1 , >>>> what comes out is the bytes that should be in sda1, but computed >>>> from the other disks. Reading from these devices should never >>>> degrade an array, at most give read error. >>>> >>>> Why is this useful? >>>> Because one could recover sda1 from a disastered array with >>>> multiple unreadable areas (unless too many are overlapping) in this >>>> way: >>>> With the array in "nodegrade" mode and blockdevice marked as readonly: >>>> 1- dd_rescue if=/dev/sda1 of=/dev/sdz1 [sdz is a good drive to >>>> eventually take sda place] >>>> take note of failed sectors >>>> 2- dd_rescue from /dev/md0_fromparity/sda1 to /dev/sdz1 only for >>>> the sectors that were unreadable from above >>>> 3- stop array, take out sda1, and reassemble the array with sdz1 in >>>> place of sda1 >>>> ... repeat for all the other drives to get a good array back. >>>> >>>> What do you think? > While this may be possibly, has anyone on this list done something > like this > and had it work successfully? Nobody could try this way because the /dev/md0_fromparity/{sda1,sdb1,...} do not exist. This is a feature request... >>>> >>>> I have another question on scrubbing: I am not sure about the exact >>>> behaviour of "check" and "repair": >>>> - will "check" degrade an array if it finds an uncorrectable >>>> read-error? >> From README.checkarray: > > 'check' is a read-only operation, even though the kernel logs may suggest > otherwise (e.g. /proc/mdstat and several kernel messages will mention > "resync"). Please also see question 21 of the FAQ. > > If, however, while reading, a read error occurs, the check will > trigger the > normal response to read errors which is to generate the 'correct' data > and try > to write that out - so it is possible that a 'check' will trigger a > write. > However in the absence of read errors it is read-only. > > Per md.txt: > > resync - redundancy is being recalculated after unclean > shutdown or creation > > repair - A full check and repair is happening. This is > similar to 'resync', but was requested by the > user, and the write-intent bitmap is NOT used to > optimise the process. > > check - A full check of redundancy was requested and is > happening. This reads all block and checks > them. A repair may also happen for some raid > levels. Unfortunately this does not specifically answer the question, even though the sentence "If, however, while reading, a read error occurs, the check will trigger the normal response to read errors..." seems to suggest that in case of uncorrectable read error the drive will be kicked. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Help on first dangerous scrub / suggestions 2009-11-26 19:02 ` Asdo @ 2009-11-26 20:55 ` Justin Piszcz 2009-11-27 13:39 ` Asdo 0 siblings, 1 reply; 13+ messages in thread From: Justin Piszcz @ 2009-11-26 20:55 UTC (permalink / raw) To: Asdo; +Cc: linux-raid On Thu, 26 Nov 2009, Asdo wrote: > Justin Piszcz wrote: >> On Thu, 26 Nov 2009, Asdo wrote: >> >>>>> >>>>> BTW I would like to ask an info on "readonly" mode mentioned here: >>>>> http://www.mjmwired.net/kernel/Documentation/md.txt >>>>> upon read error, will it initiate a rebuild / degrade the array or not? >> This is a good question but it is difficult to test as each use case is >> different. That would be a question for Neil. >> >>>>> >>>>> Anyway the "nodegrade" mode I suggest above would be still more useful >>>>> because you do not need to put the array in readonly mode, which is >>>>> important for doing backups during normal operation. >>>>> >>>>> Coming back to my problem, I have thought that the best approach would >>>>> probably be to first collect information on how good are my 12 drives, >>>>> and I probably can do that by reading each device like >>>>> dd if=/dev/sda of=/dev/null >>>>> and see how many of them read with errors. I just hope my 3ware disk >>>>> controllers won't disconnect the whole drive upon read error. >>>>> (anyone has a better strategy?) >> I see where you're going here. Read below but if you go this route I >> assume >> you would first stop the array (?) mdadm -S /dev/mdX and then test each >> individual disk one at a time? > > I don't plan to stop the array prior to reading the drives. Reads should not > be harmful... > You think otherwise? Depends on the drives, ever try to copy a file from a failing drive? Some drives will start to click/reset/ATA errors, etc. Depends on how it is failing and what is wrong with it. > I think I did read the drives in the past on a mounted array and it was no > problem. Good to hear, let us know which option you choose & what the outcome is! Justin. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Help on first dangerous scrub / suggestions 2009-11-26 20:55 ` Justin Piszcz @ 2009-11-27 13:39 ` Asdo 2009-11-27 18:11 ` Asdo 0 siblings, 1 reply; 13+ messages in thread From: Asdo @ 2009-11-27 13:39 UTC (permalink / raw) To: Justin Piszcz; +Cc: linux-raid Justin Piszcz wrote: >>> I see where you're going here. Read below but if you go this route >>> I assume >>> you would first stop the array (?) mdadm -S /dev/mdX and then test each >>> individual disk one at a time? >> >> I don't plan to stop the array prior to reading the drives. Reads >> should not be harmful... >> You think otherwise? > Depends on the drives, ever try to copy a file from a failing drive? Some > drives will start to click/reset/ATA errors, etc. Depends on how it is > failing and what is wrong with it. You are right, good thinking... The drives are WD RE2, so they do have TLER, however the default retry time is I think 7 seconds. So if I read a drive with dd and it hits an unreadable area it will retry for 7 second and during that time it would not be responsive and any request coming from MD, and I think 7secs is probably enough for MD to kick the drive out of the array. If the MD requests are queued in the elevator maybe MD will wait... not sure. I might help them to queue in elevator by disabling NCQ. TLER is configurable in theory (so to lower the time to e.g. 1 second) but I have never done it and it seems I would need to reboot into MSDOS / Freedos (wdtler is DOS utility as it seems). The problem is that the computer is in heavy use and the array also. This would be another question for Neil (if 7 secs is enough for MD to kick out the drive, and if the drive will still be kicked if MD requests to it are queued in the linux elevator). Without knowing this I will probably opt for your way: rsync data out, starting from the smallest and most important stuff... >> I think I did read the drives in the past on a mounted array and it >> was no problem. I forgot to mention that in that case there were no read errors... :-D > Good to hear, let us know which option you choose & what the outcome is! > > Justin. Thank you ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Help on first dangerous scrub / suggestions 2009-11-27 13:39 ` Asdo @ 2009-11-27 18:11 ` Asdo 2009-11-27 21:08 ` Justin Piszcz 2009-11-27 21:21 ` Neil Brown 0 siblings, 2 replies; 13+ messages in thread From: Asdo @ 2009-11-27 18:11 UTC (permalink / raw) To: Justin Piszcz; +Cc: linux-raid Asdo wrote: > .... > Without knowing this I will probably opt for your way: rsync data out, > starting from the smallest and most important stuff... > ... I had another thought: If I take the computer offline, boot with a livecd (one of the worst messes here is that the root filesystem is also in that array), run the raid6 array in READONLY MODE and maybe without spares, then I start a check (scrub) ... If drives are kicked I can probably reasseble --force the array and it's like nothing happened, right? Since it was mounted readonly I think it would be clean... Only problem would be if 1 or more drives definitively die during the procedure, but I hope this is unlikely... If less than 3 drives die I can still reassemble --force, and take the data out (at least SOME data, then if it degrades, reassemble again and try to get out data from another location...) Do you agree? I am starting to think that during the procedure for taking the data out and/or attempt first scrubbing the main problem are write accesses to the array, because if rebuild starts on a spare and then fails again and then there were writes in the middle... I think I end up doomed. Probably even reassemble --force would refuse to work on me. What do you think? Thank you ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Help on first dangerous scrub / suggestions 2009-11-27 18:11 ` Asdo @ 2009-11-27 21:08 ` Justin Piszcz 2009-11-27 21:21 ` Neil Brown 1 sibling, 0 replies; 13+ messages in thread From: Justin Piszcz @ 2009-11-27 21:08 UTC (permalink / raw) To: Asdo; +Cc: linux-raid On Fri, 27 Nov 2009, Asdo wrote: > Asdo wrote: >> .... >> Without knowing this I will probably opt for your way: rsync data out, >> starting from the smallest and most important stuff... >> ... > > I had another thought: > > If I take the computer offline, boot with a livecd (one of the worst messes > here is that the root filesystem is also in that array), run the raid6 array > in READONLY MODE and maybe without spares, then I start a check (scrub) ... > > If drives are kicked I can probably reasseble --force the array and it's like > nothing happened, right? If *more* than 2 drives fail you would need to --force. Also, when you do that (I have done it before), you need to fsck the filesystem and often many of your files will end up in /lost+found, depending on how bad it is. But in all of my tests, the FS was R/W, not R/O, so I am unsure of the outcome, it sounds like a possibility. With SW/RAID-6 I have lost two disks before and suffered no problems at all (you can't use Western Digital Velociraptors in RAID), I was able to copy the data that I needed off of the array without any issues but my server was not being heavily utilized as yours is. I think you need to run a smartctl on each of the drives, e.g.: for disk in /dev/sd[a-z] do echo "disk $disk" >> /tmp/a smartctl -a $disk >> /tmp/a done Inspect each disk, is there really a failing disk?.. > Since it was mounted readonly I think it would be clean... > > Only problem would be if 1 or more drives definitively die during the > procedure, but I hope this is unlikely... > If less than 3 drives die I can still reassemble --force, and take the data > out (at least SOME data, then if it degrades, reassemble again and try to get > out data from another location...) > > Do you agree? I agree, but would you not want to just rsync the data off first before going through all of this? > > I am starting to think that during the procedure for taking the data out > and/or attempt first scrubbing the main problem are write accesses to the > array, because if rebuild starts on a spare and then fails again and then > there were writes in the middle... I think I end up doomed. Probably even > reassemble --force would refuse to work on me. What do you think? I think you have a point there.. In my opinion: 1. Check each disk, how bad is it, really? (it seems like your array and disks are fine, one disk may have a re-allocated sector or two, nothing to worry about) in all seriousness. Do any of the attributes say *FAILING NOW*? 2. If everything looks OK, copy all of the data off while the system is ONLINE and WORKING, it will be *MUCH* more difficult trying to extract pieces of data using dd_rescue and friends vs. just rsyncing the entire array to another host. 3. Do you not have another host to rsync to? If that is the case then we may need to approach this problem from a different angle. E.g., making it read-only after booting from a LiveCD may not be a bad idea, but doing that BEFORE you rsync'd all the data off is still risking all of the data on the array, whereas since it is currently up and running you could at least make a point-in-time copy of all of the data that lives there right now. Justin. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Help on first dangerous scrub / suggestions 2009-11-27 18:11 ` Asdo 2009-11-27 21:08 ` Justin Piszcz @ 2009-11-27 21:21 ` Neil Brown 2009-12-02 10:15 ` Asdo 1 sibling, 1 reply; 13+ messages in thread From: Neil Brown @ 2009-11-27 21:21 UTC (permalink / raw) To: Asdo; +Cc: Justin Piszcz, linux-raid On Fri, 27 Nov 2009 19:11:31 +0100 Asdo <asdo@shiftmail.org> wrote: > Asdo wrote: > > .... > > Without knowing this I will probably opt for your way: rsync data out, > > starting from the smallest and most important stuff... > > ... > > I had another thought: > > If I take the computer offline, boot with a livecd (one of the worst > messes here is that the root filesystem is also in that array), run the > raid6 array in READONLY MODE and maybe without spares, then I start a > check (scrub) ... If the array is marked read-only, it wont do a scrub. However if you simply don't have any filesystem mounted, then the array will remain 'clean' and any failures are less likely cause further failures. So doing it off line is a good idea, but setting the array to read-only won't work. NeilBrown ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Help on first dangerous scrub / suggestions 2009-11-27 21:21 ` Neil Brown @ 2009-12-02 10:15 ` Asdo 0 siblings, 0 replies; 13+ messages in thread From: Asdo @ 2009-12-02 10:15 UTC (permalink / raw) To: Neil Brown; +Cc: linux-raid Neil Brown wrote: > If the array is marked read-only, it wont do a scrub. > > However if you simply don't have any filesystem mounted, then the array > will remain 'clean' and any failures are less likely cause further > failures. > > So doing it off line is a good idea, but setting the array to read-only > won't work. > Thanks for the info Neil, would mounting the filesystems readonly be safe in the same way? (I might try to rsync some data out during the scrub) Thank you Asdo ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Help on first dangerous scrub / suggestions 2009-11-26 12:14 Help on first dangerous scrub / suggestions Asdo 2009-11-26 12:22 ` Justin Piszcz @ 2009-11-26 14:03 ` Mikael Abrahamsson 2009-11-26 14:13 ` Asdo 1 sibling, 1 reply; 13+ messages in thread From: Mikael Abrahamsson @ 2009-11-26 14:03 UTC (permalink / raw) To: Asdo; +Cc: linux-raid On Thu, 26 Nov 2009, Asdo wrote: > Now I am scared about initiating the first scrub because if it turns out > that 3 areas in different disks have bad sectors I think am gonna lose > the whole array. What kernel are you using? As of 2.6.15 or so, sending "repair" (or "resync", I don't remember exactly) to the md will read all data and if there is bad data, parity will be used to write to the bad sector (it shouldn't kick the disk). <http://linux-raid.osdl.org/index.php/RAID_Administration> -- Mikael Abrahamsson email: swmike@swm.pp.se ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Help on first dangerous scrub / suggestions 2009-11-26 14:03 ` Mikael Abrahamsson @ 2009-11-26 14:13 ` Asdo 0 siblings, 0 replies; 13+ messages in thread From: Asdo @ 2009-11-26 14:13 UTC (permalink / raw) To: Mikael Abrahamsson; +Cc: linux-raid Mikael Abrahamsson wrote: > On Thu, 26 Nov 2009, Asdo wrote: > >> Now I am scared about initiating the first scrub because if it turns >> out that 3 areas in different disks have bad sectors I think am gonna >> lose the whole array. > > What kernel are you using? > > As of 2.6.15 or so, sending "repair" (or "resync", I don't remember > exactly) to the md will read all data and if there is bad data, parity > will be used to write to the bad sector (it shouldn't kick the disk). > > <http://linux-raid.osdl.org/index.php/RAID_Administration> > Kernel is ubuntu kernel 2.6.24 . In the page you are linking I don't see mention of the fact that drives won't be kicked with a "repair" or "check". In fact regarding "check" this is written: 'check' just reads everything and doesn't trigger any writes unless a read error is detected, in which case the normal read-error handing kicks in. "normal error handling" seems to suggest that if the read error is uncorrectable the drive will be kicked. You don't think so? Thank you ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2009-12-02 10:15 UTC | newest] Thread overview: 13+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2009-11-26 12:14 Help on first dangerous scrub / suggestions Asdo 2009-11-26 12:22 ` Justin Piszcz 2009-11-26 14:06 ` Asdo 2009-11-26 14:38 ` Justin Piszcz 2009-11-26 19:02 ` Asdo 2009-11-26 20:55 ` Justin Piszcz 2009-11-27 13:39 ` Asdo 2009-11-27 18:11 ` Asdo 2009-11-27 21:08 ` Justin Piszcz 2009-11-27 21:21 ` Neil Brown 2009-12-02 10:15 ` Asdo 2009-11-26 14:03 ` Mikael Abrahamsson 2009-11-26 14:13 ` Asdo
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).