3-disk fail on raid-6, examining my options...

Linux RAID subsystem development
 help / color / mirror / Atom feed

* 3-disk fail on raid-6, examining my options...
@ 2017-07-18 17:20 Maarten
  2017-07-18 20:20 ` Wols Lists
  0 siblings, 1 reply; 6+ messages in thread
From: Maarten @ 2017-07-18 17:20 UTC (permalink / raw)
  To: linux-raid

Argh.. Murphy can be such a troll... :(

Hi all,

While I was in the process of migrating all my raid-6 arrays to raid-1
arrays (with either two or three member disks), I got stung severely.
(Obviously I shouldn't have been so stupid as to write to an array not
yet fully copied, but that is now too late to undo)

  What probably happened:
A six-disk raid-6 array suffered a simultaneous two-disk failure which
went unnoticed for a number of hours, and then inevitably got hit by a
-catastrophic- 3rd disk failure during the following night.

The first two disks that failed have exactly identical event counters
according to mdadm -E <disk device> which leads me to believe that it is
probably the SATA card/controller that failed/oops'ed, not the disks
themselves. But at this point that has not yet been verified.

The third disk, and the array, have a substantially higher event
counter. This makes complete sense, since the array was being actively
_written_ to at the time. (Yes, alas...) *Bangs head against desk*

Now from what I've gathered over the years and from earlier incidents, I
have now 1 (one) chance left to rescue data off this array; by hopefully
cloning the bad 3rd-failed drive with the aid of dd_rescue and
re-assembling --force the fully-degraded array. (Only IF that drive is
still responsive and can be cloned)

My feeling is, the two ``good'' drives with the lower event counter are
now more useful as paperweights than to help restore any data... But I
like to have certainty before I try other ways to restore (or recreate)
data...

Is there any hope?

Here are some snippets from mdadm:

md0 : active raid6 sdh1[10] sdj1[8] sdm1[9] sdb1[3](F) sde1[7](F) sdd1[6](F)
7799470080 blocks super 1.2 level 6, 512k chunk, algorithm 2 [6/3] [U___UU]

mdadm -E /dev/sde1:
    Update Time : Mon Jul 17 15:49:44 2017
       Checksum : fdc7fdd7 - correct
         Events : 58235
    Array State : AAAAAA ('A' == active, '.' == missing)

mdadm -E /dev/sdb1:
    Update Time : Mon Jul 17 15:49:44 2017
       Checksum : cd97800c - correct
         Events : 58235
    Array State : AAAAAA ('A' == active, '.' == missing)

mdadm -E /dev/sdd1:
    Update Time : Tue Jul 18 01:47:33 2017
       Checksum : d00eff1d - correct
         Events : 69129
    Array State : AA..AA ('A' == active, '.' == missing)

mdadm --detail /dev/md0
 Failed Devices : 3
         Events : 69132

Thanks for any insights...

regards,
Maarten

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: 3-disk fail on raid-6, examining my options...
  2017-07-18 17:20 3-disk fail on raid-6, examining my options Maarten
@ 2017-07-18 20:20 ` Wols Lists
  2017-07-18 20:25   ` Wakko Warner
  0 siblings, 1 reply; 6+ messages in thread
From: Wols Lists @ 2017-07-18 20:20 UTC (permalink / raw)
  To: Maarten, linux-raid

On 18/07/17 18:20, Maarten wrote:
> Now from what I've gathered over the years and from earlier incidents, I
> have now 1 (one) chance left to rescue data off this array; by hopefully
> cloning the bad 3rd-failed drive with the aid of dd_rescue and
> re-assembling --force the fully-degraded array. (Only IF that drive is
> still responsive and can be cloned)

If it clones successfully, great. If it clones, but with badblocks, I
keep on asking - is there any way we can work together to turn
dd-rescue's log into a utility that will flag failed blocks as "unreadable"?

This project is mentioned on the wiki, the idea being that you can tell
the hard drive to return a read error if the computer tries to access
certain blocks. This "fake" read error means that we can then use a
partially copied disk to rebuild an array knowing that we will get an
error rather than silent corruption if we try and access the faulty block.

At least then, we know what is damaged rather than trying to
integrity-check an entire disk in the hope that we'll detect the corruption.

Cheers,
Wol

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: 3-disk fail on raid-6, examining my options...
  2017-07-18 20:20 ` Wols Lists
@ 2017-07-18 20:25   ` Wakko Warner
  2017-07-18 21:29     ` Maarten
  2017-07-19 11:51     ` Wols Lists
  0 siblings, 2 replies; 6+ messages in thread
From: Wakko Warner @ 2017-07-18 20:25 UTC (permalink / raw)
  To: Wols Lists; +Cc: Maarten, linux-raid

Wols Lists wrote:
> On 18/07/17 18:20, Maarten wrote:
> > Now from what I've gathered over the years and from earlier incidents, I
> > have now 1 (one) chance left to rescue data off this array; by hopefully
> > cloning the bad 3rd-failed drive with the aid of dd_rescue and
> > re-assembling --force the fully-degraded array. (Only IF that drive is
> > still responsive and can be cloned)
> 
> If it clones successfully, great. If it clones, but with badblocks, I
> keep on asking - is there any way we can work together to turn
> dd-rescue's log into a utility that will flag failed blocks as "unreadable"?

I wrote a shell script that will output a device mapper table to do this. 
It will do either zero or error targets for failed blocks.  It's not
automatic and does require a block device (loop for files).  I've used this
several times at work and works for me.

I'm not sure if this is what you're talking about or not, but if you want
the script, I'll post it.

-- 
 Microsoft has beaten Volkswagen's world record.  Volkswagen only created 22
 million bugs.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: 3-disk fail on raid-6, examining my options...
  2017-07-18 20:25   ` Wakko Warner
@ 2017-07-18 21:29     ` Maarten
  2017-07-19 11:51     ` Wols Lists
  1 sibling, 0 replies; 6+ messages in thread
From: Maarten @ 2017-07-18 21:29 UTC (permalink / raw)
  To: linux-raid

On 07/18/2017 10:25 PM, Wakko Warner wrote:
> Wols Lists wrote:
>> On 18/07/17 18:20, Maarten wrote:
>>> Now from what I've gathered over the years and from earlier incidents, I
>>> have now 1 (one) chance left to rescue data off this array; by hopefully
>>> cloning the bad 3rd-failed drive with the aid of dd_rescue and
>>> re-assembling --force the fully-degraded array. (Only IF that drive is
>>> still responsive and can be cloned)
>>
>> If it clones successfully, great. If it clones, but with badblocks, I
>> keep on asking - is there any way we can work together to turn
>> dd-rescue's log into a utility that will flag failed blocks as "unreadable"?
> 
> I wrote a shell script that will output a device mapper table to do this. 
> It will do either zero or error targets for failed blocks.  It's not
> automatic and does require a block device (loop for files).  I've used this
> several times at work and works for me.
> 
> I'm not sure if this is what you're talking about or not, but if you want
> the script, I'll post it.

For me, I don't think it will make much difference. On top of the array
there are a number of LVM volumes. Of most of them I have full and
current backups. Some of it is [now] free space. There are two volumes
that hold data that is both important to me and not backed up recently
enough.

Those two volumes together take up about 33%-40% of the total size. So
the chances of bad sectors affecting these are also (somewhat) smaller.
And the data will still be valuable to me, even when it has some silent
corruption.

No, my main question that I seek a definitive answer to is whether the
two drives that failed earlier hold anything of worth, or that any
salvaging data using them is out of the question.

In the mean time, I occupy my time with copying the data I wanted to
copy onto the array to a remote system, and to make sure all my backups
and copies that were not redundant get proper redundancy. I will not
'touch' the machine with the broken array until all that is sorted (it
has an other raid-6 array, which is healthy... for now at least).

I hope the 3rd failed drive won't deteriorate during that time, but
under the circumstances, I'm going to take that risk nonetheless.

regards,
Maarten

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: 3-disk fail on raid-6, examining my options...
  2017-07-18 20:25   ` Wakko Warner
  2017-07-18 21:29     ` Maarten
@ 2017-07-19 11:51     ` Wols Lists
  2017-07-19 17:09       ` Wakko Warner
  1 sibling, 1 reply; 6+ messages in thread
From: Wols Lists @ 2017-07-19 11:51 UTC (permalink / raw)
  To: Wakko Warner; +Cc: Maarten, linux-raid

On 18/07/17 21:25, Wakko Warner wrote:
> Wols Lists wrote:
>> On 18/07/17 18:20, Maarten wrote:
>>> Now from what I've gathered over the years and from earlier incidents, I
>>> have now 1 (one) chance left to rescue data off this array; by hopefully
>>> cloning the bad 3rd-failed drive with the aid of dd_rescue and
>>> re-assembling --force the fully-degraded array. (Only IF that drive is
>>> still responsive and can be cloned)
>>
>> If it clones successfully, great. If it clones, but with badblocks, I
>> keep on asking - is there any way we can work together to turn
>> dd-rescue's log into a utility that will flag failed blocks as "unreadable"?
> 
> I wrote a shell script that will output a device mapper table to do this. 
> It will do either zero or error targets for failed blocks.  It's not
> automatic and does require a block device (loop for files).  I've used this
> several times at work and works for me.
> 
> I'm not sure if this is what you're talking about or not, but if you want
> the script, I'll post it.
> 
I'm not sure I understand what you're saying, but I'm certainly
interested. It'll probably end up on the wiki if that's okay with you?

I'll aim to understand and document it so others will be able hopefully
to use it as a "fire and forget" tool (inasmuch as you can
fire-and-forget any recovery task :-)

What I'm thinking of is a utility that uses "hdparm --make-bad-sector".
The idea being that if you have multiple disk failures, you can at least
clone everything worth having off the broken disks, and then you can run
a "tar . > /dev/null" or do a sync or whatever, and know that if it
reads successfully off the array it isn't corrupt. Unless you're unlucky
enough to have multiple drives fail in the same stripe, you should then
recover your array no problem.

Cheers,
Wol

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: 3-disk fail on raid-6, examining my options...
  2017-07-19 11:51     ` Wols Lists
@ 2017-07-19 17:09       ` Wakko Warner
  0 siblings, 0 replies; 6+ messages in thread
From: Wakko Warner @ 2017-07-19 17:09 UTC (permalink / raw)
  To: Wols Lists; +Cc: Maarten, linux-raid

[-- Attachment #1: Type: text/plain, Size: 3462 bytes --]

Wols Lists wrote:
> On 18/07/17 21:25, Wakko Warner wrote:
> > Wols Lists wrote:
> >> On 18/07/17 18:20, Maarten wrote:
> >>> Now from what I've gathered over the years and from earlier incidents, I
> >>> have now 1 (one) chance left to rescue data off this array; by hopefully
> >>> cloning the bad 3rd-failed drive with the aid of dd_rescue and
> >>> re-assembling --force the fully-degraded array. (Only IF that drive is
> >>> still responsive and can be cloned)
> >>
> >> If it clones successfully, great. If it clones, but with badblocks, I
> >> keep on asking - is there any way we can work together to turn
> >> dd-rescue's log into a utility that will flag failed blocks as "unreadable"?
> > 
> > I wrote a shell script that will output a device mapper table to do this. 
> > It will do either zero or error targets for failed blocks.  It's not
> > automatic and does require a block device (loop for files).  I've used this
> > several times at work and works for me.
> > 
> > I'm not sure if this is what you're talking about or not, but if you want
> > the script, I'll post it.
> > 
> I'm not sure I understand what you're saying, but I'm certainly
> interested. It'll probably end up on the wiki if that's okay with you?

That's fine.

> I'll aim to understand and document it so others will be able hopefully
> to use it as a "fire and forget" tool (inasmuch as you can
> fire-and-forget any recovery task :-)
> 
> What I'm thinking of is a utility that uses "hdparm --make-bad-sector".
> The idea being that if you have multiple disk failures, you can at least
> clone everything worth having off the broken disks, and then you can run
> a "tar . > /dev/null" or do a sync or whatever, and know that if it
> reads successfully off the array it isn't corrupt. Unless you're unlucky
> enough to have multiple drives fail in the same stripe, you should then
> recover your array no problem.

That's pretty much how I use it in a way.

Here's a real ddrescue log from one that I did:
# Rescue Logfile. Created by GNU ddrescue version 1.16
# Command line: ddrescue -s 85900394496 /dev/sdg /path/to/image.img /path/to/image.log
# current_pos  current_status
0xA078F9C00     +
#      pos        size  status
0x00000000  0xA078F9000  +
0xA078F9000  0x00001000  -
0xA078FA000  0x9F8806000  +
0x1400100000  0x2638A2E000  ?

I use losetup to make /path/to/image.img a block device.

I run the script I wrote:
sh ddlog-to-dm.sh /dev/loop0 < /path/to/image.log

Which outputs the following:
0 84133832 linear /dev/loop0 0
84133832 8 error
84133840 83640368 linear /dev/loop0 84133840

Then I run:
dmsetup create sometarget

I paste in the output and I now have /dev/mapper/sometarget that has errors
at the location that was bad.  Since it uses device mapper, the error part
doesn't retry.  This will work with hard disks instead of images.

To work with a real disk, skip the losetup part and use /dev/sdX instead of
/dev/loop0.  In my case above, assume I closed sdg to sdh, I would do:
sh ddlog-to-dm.sh /dev/sdh < /path/to/image.log
dmsetup create sdh

Then use /dev/mapper/sdh.

If you're familiar with device mapper, there are no partitions, you have to
create another target.  I use kpartx -a for this and when I'm done, I use
kpartx -d to tear it down.

When you're done, dmsetup remove sometarget and remove the loop device.

I have attached the script.

-- 
 Microsoft has beaten Volkswagen's world record.  Volkswagen only created 22
 million bugs.

[-- Attachment #2: ddlog-to-dm.sh --]
[-- Type: application/x-sh, Size: 2205 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2017-07-19 17:09 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-07-18 17:20 3-disk fail on raid-6, examining my options Maarten
2017-07-18 20:20 ` Wols Lists
2017-07-18 20:25   ` Wakko Warner
2017-07-18 21:29     ` Maarten
2017-07-19 11:51     ` Wols Lists
2017-07-19 17:09       ` Wakko Warner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox