All of lore.kernel.org
 help / color / mirror / Atom feed
From: Asdo <asdo@shiftmail.org>
To: Justin Piszcz <jpiszcz@lucidpixels.com>
Cc: linux-raid <linux-raid@vger.kernel.org>
Subject: Re: Help on first dangerous scrub / suggestions
Date: Thu, 26 Nov 2009 15:06:45 +0100	[thread overview]
Message-ID: <4B0E8B75.2030006@shiftmail.org> (raw)
In-Reply-To: <alpine.DEB.2.00.0911260720220.23317@p34.internal.lan>

Justin Piszcz wrote:
> On Thu, 26 Nov 2009, Asdo wrote:
>
>> Hi all
>> we have a server with a 12 disks raid-6.
>> It has been up for 1 year now but I have never scrubbed it because at 
>> the time I did not know about this good practice (a note on man mdadm 
>> would help).
>> The array is currently not degraded and has spares.
>>
>> Now I am scared about initiating the first scrub because if it turns 
>> out that 3 areas in different disks have bad sectors I think am gonna 
>> lose the whole array.
>>
>> Doing backups now it's also scary because if I hit a bad 
>> (uncorrectable) area in anyone of the disks while reading, a rebuild 
>> will start on the spare and that's like initiating the scrub with all 
>> associated risks.
>>
>> About this point, I would like to suggest a new "mode" of the array, 
>> let's call it "nodegrade" in which no degradation can occur, and I/O 
>> in unreadable areas simply fails with I/O error. By temporarily 
>> putting the array in that mode, at least one could backup without 
>> anxiety. I understand it would not be possible to add a spare / 
>> rebuild in this mode but that's ok.
>>
>> BTW I would like to ask an info on "readonly" mode mentioned here:
>> http://www.mjmwired.net/kernel/Documentation/md.txt
>> upon read error, will it initiate a rebuild / degrade the array or not?
>>
>> Anyway the "nodegrade" mode I suggest above would be still more 
>> useful because you do not need to put the array in readonly mode, 
>> which is important for doing backups during normal operation.
>>
>> Coming back to my problem, I have thought that the best approach 
>> would probably be to first collect information on how good are my 12 
>> drives, and I probably can do that by reading each device like
>> dd if=/dev/sda of=/dev/null
>> and see how many of them read with errors. I just hope my 3ware disk 
>> controllers won't disconnect the whole drive upon read error.
>> (anyone has a better strategy?)
>>
>> But then if it turns out that 3 of them indeed have unreadable areas 
>> I am screwed anyway. Even with dd_rescue there's no strategy that can 
>> save my data, even if the unreadable areas have different placement 
>> in the 3 disks (and that's a case where it should instead be possible 
>> to get data back).
>>
>> This brings to my second suggestion:
>> I would like to see 12 (in my case) devices like:
>> /dev/md0_fromparity/{sda1,sdb1,...}   (all readonly)
>> that behave like this: when reading from /dev/md0_fromparity/sda1 , 
>> what comes out is the bytes that should be in sda1, but computed from 
>> the other disks. Reading from these devices should never degrade an 
>> array, at most give read error.
>>
>> Why is this useful?
>> Because one could recover sda1 from a disastered array with multiple 
>> unreadable areas (unless too many are overlapping) in this way:
>> With the array in "nodegrade" mode and blockdevice marked as readonly:
>> 1- dd_rescue if=/dev/sda1 of=/dev/sdz1   [sdz is a good drive to 
>> eventually take sda place]
>>    take note of failed sectors
>> 2- dd_rescue from /dev/md0_fromparity/sda1 to /dev/sdz1 only for the 
>> sectors that were unreadable from above
>> 3- stop array, take out sda1, and reassemble the array with sdz1 in 
>> place of sda1
>> ... repeat for all the other drives to get a good array back.
>>
>> What do you think?
>>
>> I have another question on scrubbing: I am not sure about the exact 
>> behaviour of "check" and "repair":
>> - will "check" degrade an array if it finds an uncorrectable 
>> read-error? The manual only mentions what happens if the checksums of 
>> the parity disks don't match with data, but that's not what I'm 
>> interested in right now.
>> - will "repair" .... (same question as above)
>>
>> Thanks for your comments
>> -- 
>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>
> Have you gotten any filesystem errors thus far?
> How bad are the disks?
Only one disk gave correctable read errors in dmesg twice (no filesystem 
errors), 64 sectors in sequence each time.
Smartctl -a reports indeed those errors on that disk, and no errors on 
all the other disks.
(
on the partially-bad disk:
 SMART overall-health self-assessment test result: PASSED
 ...
 1 Raw_Read_Error_Rate     0x000f   200   200   051    Pre-fail  
Always       -       138
 ...
 5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  
Always       -       0
the other disks have values: PASSED, 0, 0
)
However I never ran smartctl tests, so the only errors smartctl is aware 
of are indeed those I also got from md.

> Can you show the smartctl -a output of each of the 12 drives?
> Can you rsync all of the data to another host?
> What filesystem is being used?
>
> If your disks are failing I'd recommend an rsync ASAP over trying to 
> read/write/test the disks with dd or other tests.
Filesystem is ext3
For the rsync I am worried, have you read my original post? If rsync 
hits an area with uncorrectable read errors the rebuild will start and 
then if turns out there are other 2 partially-unreadable disks I will 
lose the array. And I will lose it *right now* and without knowing for 
sure before.
What are the drawbacks you see against the dd test I proposed? It's just 
to probe to have an idea of how bad is the situation, without changing 
the situation yet...




  reply	other threads:[~2009-11-26 14:06 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-11-26 12:14 Help on first dangerous scrub / suggestions Asdo
2009-11-26 12:22 ` Justin Piszcz
2009-11-26 14:06   ` Asdo [this message]
2009-11-26 14:38     ` Justin Piszcz
2009-11-26 19:02       ` Asdo
2009-11-26 20:55         ` Justin Piszcz
2009-11-27 13:39           ` Asdo
2009-11-27 18:11             ` Asdo
2009-11-27 21:08               ` Justin Piszcz
2009-11-27 21:21               ` Neil Brown
2009-12-02 10:15                 ` Asdo
2009-11-26 14:03 ` Mikael Abrahamsson
2009-11-26 14:13   ` Asdo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4B0E8B75.2030006@shiftmail.org \
    --to=asdo@shiftmail.org \
    --cc=jpiszcz@lucidpixels.com \
    --cc=linux-raid@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.