raid 5, drives marked as failed. Can I recover?

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* raid 5, drives marked as failed. Can I recover?
@ 2009-01-29 23:17 Tom
  2009-01-30 14:18 ` Justin Piszcz
  0 siblings, 1 reply; 5+ messages in thread
From: Tom @ 2009-01-29 23:17 UTC (permalink / raw)
  To: linux-raid

Hello,

2 drives have failed on my raid5 setup and I need to recover the data
on the raid.
I am sure that the drives still works or at least one of them still works.

How do I recover my drives?

I can't mount the raid no more and I am missing a hard drive when i
run ls /dev/sd?
I have 7 drives on my raid.

Here is output of /var/log/messages in following link

http://matx.pastebin.com/m35423452

also some more information

tom@desu ~ $ cat /proc/mdstat
Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5]
[raid4] [multipath]
md2 : inactive sdc1[1] sdd1[4] sdf1[3] sde1[2]
      1953214208 blocks


tom@desu ~ $ sudo mdadm --detail /dev/md2
Password:
/dev/md2:
        Version : 00.90.03
  Creation Time : Thu Sep  4 20:14:31 2008
     Raid Level : raid5
  Used Dev Size : 488303552 (465.68 GiB 500.02 GB)
   Raid Devices : 7
  Total Devices : 4
Preferred Minor : 2
    Persistence : Superblock is persistent

    Update Time : Thu Jan 29 21:16:34 2009
          State : active, degraded, Not Started
 Active Devices : 4
Working Devices : 4
 Failed Devices : 0
  Spare Devices : 0

         Layout : left-symmetric
     Chunk Size : 64K

           UUID : cf3d1948:1d0e65b6:c028c7c8:
56f0c54c
         Events : 0.1411738

    Number   Major   Minor   RaidDevice State
       0       0        0        0      removed
       1       8       33        1      active sync   /dev/sdc1
       2       8       65        2      active sync   /dev/sde1
       3       8       81        3      active sync   /dev/sdf1
       4       8       49        4      active sync   /dev/sdd1
       5       0        0        5      removed
       6       0        0        6      removed




Thank you for your time in advance.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: raid 5, drives marked as failed. Can I recover?
  2009-01-29 23:17 raid 5, drives marked as failed. Can I recover? Tom
@ 2009-01-30 14:18 ` Justin Piszcz
  2009-01-30 14:56   ` David Greaves
  0 siblings, 1 reply; 5+ messages in thread
From: Justin Piszcz @ 2009-01-30 14:18 UTC (permalink / raw)
  To: Tom; +Cc: linux-raid

Try to assmeble the array with --force.

On Thu, 29 Jan 2009, Tom wrote:

> Hello,
>
> 2 drives have failed on my raid5 setup and I need to recover the data
> on the raid.
> I am sure that the drives still works or at least one of them still works.
>
> How do I recover my drives?
>
> I can't mount the raid no more and I am missing a hard drive when i
> run ls /dev/sd?
> I have 7 drives on my raid.
>
> Here is output of /var/log/messages in following link
>
> http://matx.pastebin.com/m35423452
>
> also some more information
>
> tom@desu ~ $ cat /proc/mdstat
> Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5]
> [raid4] [multipath]
> md2 : inactive sdc1[1] sdd1[4] sdf1[3] sde1[2]
>      1953214208 blocks
>
>
> tom@desu ~ $ sudo mdadm --detail /dev/md2
> Password:
> /dev/md2:
>        Version : 00.90.03
>  Creation Time : Thu Sep  4 20:14:31 2008
>     Raid Level : raid5
>  Used Dev Size : 488303552 (465.68 GiB 500.02 GB)
>   Raid Devices : 7
>  Total Devices : 4
> Preferred Minor : 2
>    Persistence : Superblock is persistent
>
>    Update Time : Thu Jan 29 21:16:34 2009
>          State : active, degraded, Not Started
> Active Devices : 4
> Working Devices : 4
> Failed Devices : 0
>  Spare Devices : 0
>
>         Layout : left-symmetric
>     Chunk Size : 64K
>
>           UUID : cf3d1948:1d0e65b6:c028c7c8:
> 56f0c54c
>         Events : 0.1411738
>
>    Number   Major   Minor   RaidDevice State
>       0       0        0        0      removed
>       1       8       33        1      active sync   /dev/sdc1
>       2       8       65        2      active sync   /dev/sde1
>       3       8       81        3      active sync   /dev/sdf1
>       4       8       49        4      active sync   /dev/sdd1
>       5       0        0        5      removed
>       6       0        0        6      removed
>
>
>
>
> Thank you for your time in advance.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: raid 5, drives marked as failed. Can I recover?
  2009-01-30 14:18 ` Justin Piszcz
@ 2009-01-30 14:56   ` David Greaves
  2009-01-30 15:06     ` Tom
  0 siblings, 1 reply; 5+ messages in thread
From: David Greaves @ 2009-01-30 14:56 UTC (permalink / raw)
  To: Justin Piszcz; +Cc: Tom, linux-raid

Justin Piszcz wrote:
> Try to assmeble the array with --force.
hmmmm? not yet...

> On Thu, 29 Jan 2009, Tom wrote:
> 
>> Hello,
>>
>> 2 drives have failed on my raid5 setup and I need to recover the data
>> on the raid.
>> I am sure that the drives still works or at least one of them still
>> works.
>>
>> How do I recover my drives?

How important is it?
The more important the data the more you should reduce the risk of a subsequent
failure.
If you "don't care" then we just force it back together and cross fingers.
Otherwise we run tests on all the drives before trying a restore.
I'd say to run these tests on each drive; as a minimum do the first test on the
failed drive, more paranoia, more tests and include the non-failed drives (to
ensure they don't fail during recovery):
* smartctl -t short
* smartctl -t long
* badblocks

What happened? Smoke?
Are the drives faulty (what does smartctl -a tell you)
Did the cables just wiggle? Is the controller broken?
You probably don't know :)

I would obtain replacements for the failed drives and use ddrescue to copy from
the failed drive to a replacement.
Then install the good drives and begin recovery.

>> I can't mount the raid no more and I am missing a hard drive when i
>> run ls /dev/sd?
>> I have 7 drives on my raid.
You say you have 7 drives and 2 are failed.
And yet I see 4 drives, not 5.

Where is sdg?

>> Here is output of /var/log/messages in following link
>>
>> http://matx.pastebin.com/m35423452

Jan 29 21:14:11  sda died
Jan 29 21:14:12  sdb died

>>
>> also some more information
Also need:
 Distro
 Kernel version
 Mdadm version
 mdadm --examine for each available component.

David

-- 
"Don't worry, you'll be fine; I saw it work in a cartoon once..."

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: raid 5, drives marked as failed. Can I recover?
  2009-01-30 14:56   ` David Greaves
@ 2009-01-30 15:06     ` Tom
  2009-01-30 15:24       ` David Greaves
  0 siblings, 1 reply; 5+ messages in thread
From: Tom @ 2009-01-30 15:06 UTC (permalink / raw)
  To: David Greaves, jpiszcz; +Cc: linux-raid

Hello,

I spent a night trying out mdadm --assemble on a virtual machine to
see how it attempts to fix a raid where 2 or more drives have been
marked faulty.
I was quite sure that the drives were fine and that they were wrongly
marked as bad.
I think I just have a bad ata controller.

I used --assemble on real machine and it seemed to have detected the raid again.
1 drive was found to be bad and it is recreating it now.
But my data is there and I can open it.
I am going to get some dvd's and back all this up before it dies again!


Regards and thanks for your help!


2009/1/30 David Greaves <david@dgreaves.com>:
> Justin Piszcz wrote:
>> Try to assmeble the array with --force.
> hmmmm? not yet...
>
>
>> On Thu, 29 Jan 2009, Tom wrote:
>>
>>> Hello,
>>>
>>> 2 drives have failed on my raid5 setup and I need to recover the data
>>> on the raid.
>>> I am sure that the drives still works or at least one of them still
>>> works.
>>>
>>> How do I recover my drives?
>
> How important is it?
> The more important the data the more you should reduce the risk of a subsequent
> failure.
> If you "don't care" then we just force it back together and cross fingers.
> Otherwise we run tests on all the drives before trying a restore.
> I'd say to run these tests on each drive; as a minimum do the first test on the
> failed drive, more paranoia, more tests and include the non-failed drives (to
> ensure they don't fail during recovery):
> * smartctl -t short
> * smartctl -t long
> * badblocks
>
> What happened? Smoke?
> Are the drives faulty (what does smartctl -a tell you)
> Did the cables just wiggle? Is the controller broken?
> You probably don't know :)
>
> I would obtain replacements for the failed drives and use ddrescue to copy from
> the failed drive to a replacement.
> Then install the good drives and begin recovery.
>
>>> I can't mount the raid no more and I am missing a hard drive when i
>>> run ls /dev/sd?
>>> I have 7 drives on my raid.
> You say you have 7 drives and 2 are failed.
> And yet I see 4 drives, not 5.
>
> Where is sdg?
>
>>> Here is output of /var/log/messages in following link
>>>
>>> http://matx.pastebin.com/m35423452
>
> Jan 29 21:14:11  sda died
> Jan 29 21:14:12  sdb died
>
>>>
>>> also some more information
> Also need:
>  Distro
>  Kernel version
>  Mdadm version
>  mdadm --examine for each available component.
>
> David
>
> --
> "Don't worry, you'll be fine; I saw it work in a cartoon once..."
>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: raid 5, drives marked as failed. Can I recover?
  2009-01-30 15:06     ` Tom
@ 2009-01-30 15:24       ` David Greaves
  0 siblings, 0 replies; 5+ messages in thread
From: David Greaves @ 2009-01-30 15:24 UTC (permalink / raw)
  To: Tom; +Cc: jpiszcz, linux-raid

Tom wrote:
> Hello,
> 
> I spent a night trying out mdadm --assemble on a virtual machine to
> see how it attempts to fix a raid where 2 or more drives have been
> marked faulty.
> I was quite sure that the drives were fine and that they were wrongly
> marked as bad.
> I think I just have a bad ata controller.
Given 2 drives died in 1 second then I'd agree.

> I used --assemble on real machine and it seemed to have detected the raid again.
> 1 drive was found to be bad and it is recreating it now.
> But my data is there and I can open it.
> I am going to get some dvd's and back all this up before it dies again!

OK, that's good :)

A forced assemble will make md assume all the disks are good and that all writes
succeeded. ie all is well.

They probably didn't and it probably isn't. OTOH you probably lost a few hundred
bytes in many many Gb so nothing to panic over.

You should fsck and, ideally, checksum compare your filesystem against a backup.
I would run a read-only fsck before doing anything. Then if you just have light
damage, repair.

David


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2009-01-30 15:24 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-01-29 23:17 raid 5, drives marked as failed. Can I recover? Tom
2009-01-30 14:18 ` Justin Piszcz
2009-01-30 14:56   ` David Greaves
2009-01-30 15:06     ` Tom
2009-01-30 15:24       ` David Greaves

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).