Help with failed raid 5

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Help with failed raid 5
@ 2013-05-04 16:51 Frederick Gnodtke
  2013-05-04 21:59 ` Drew
  2013-05-05 21:52 ` Phil Turmel
  0 siblings, 2 replies; 3+ messages in thread
From: Frederick Gnodtke @ 2013-05-04 16:51 UTC (permalink / raw)
  To: linux-raid

Hi,

I hope someone can help me with this as I am struggling with it since 
this morning.

The following scenario: I have a softwareraid 5 created using mdadm.
It consisted out of 5 Disks, each of them 2000GB with one as spare-drive.
The original mdadm.conf looked like this:

# mdadm.conf
#
# Please refer to mdadm.conf(5) for information about this file.
#

# by default, scan all partitions (/proc/partitions) for MD superblocks.
# alternatively, specify devices to scan, using wildcards if desired.
DEVICE partitions

# auto-create devices with Debian standard permissions
CREATE owner=root group=disk mode=0666 auto=yes

# automatically tag new arrays as belonging to the local system
HOMEHOST <system>

# instruct the monitoring daemon where to send mail alerts
MAILADDR root

# definitions of existing MD arrays

# This file was auto-generated on Mon, 19 Nov 2012 09:30:14 -0800
# by mkconf 3.1.4-1+8efb9d1+squeeze1
ARRAY /dev/md/0 metadata=1.2 spares=1 name=CronosR:0 
UUID=c008e97a:aacc7745:c8c49a31:08312d4e


Everything was fine until this morning I tried to open a file and got an 
I/O-Error. I rebooted the computer, stopped the raid, did an smartctl -t 
long on all drives belonging to the raid, but they all seem to run quiet 
well.
Reassembling it using mdadm --assemble --scan --force did not lead to 
anything so I tried to recreate it using "mdadm --create /dev/md0 
--assume-clean --raid-devices=5 --level=5 /dev/sd[abdef]
It created an array but there was no filesystem to mount.
fsck could not detect an filesystem and I didn't relocate any "bad 
blocks" as I was afraid this might reduce my chance to repair the raid 
to zero.

The original Superblocks of all disks befor recreating the array looked 
like this (The raid already failed when I captured this):

/dev/sdb:
           Magic : a92b4efc
         Version : 1.2
     Feature Map : 0x0
      Array UUID : c008e97a:aacc7745:c8c49a31:08312d4e
            Name : CronosR:0  (local to host CronosR)
   Creation Time : Mon Nov 12 10:27:52 2012
      Raid Level : raid5
    Raid Devices : 5

  Avail Dev Size : 3907027120 (1863.02 GiB 2000.40 GB)
      Array Size : 15628103680 (7452.06 GiB 8001.59 GB)
   Used Dev Size : 3907025920 (1863.02 GiB 2000.40 GB)
     Data Offset : 2048 sectors
    Super Offset : 8 sectors
           State : clean
     Device UUID : eec2e852:30e9cbcf:d90a5e2c:e176ee4b

     Update Time : Sat May  4 02:48:00 2013
        Checksum : 4479154a - correct
          Events : 0

          Layout : left-symmetric
      Chunk Size : 512K

    Device Role : spare
    Array State : .AA.A ('A' == active, '.' == missing)
/dev/sdc:
           Magic : a92b4efc
         Version : 1.2
     Feature Map : 0x0
      Array UUID : c008e97a:aacc7745:c8c49a31:08312d4e
            Name : CronosR:0  (local to host CronosR)
   Creation Time : Mon Nov 12 10:27:52 2012
      Raid Level : raid5
    Raid Devices : 5

  Avail Dev Size : 3907027120 (1863.02 GiB 2000.40 GB)
      Array Size : 15628103680 (7452.06 GiB 8001.59 GB)
   Used Dev Size : 3907025920 (1863.02 GiB 2000.40 GB)
     Data Offset : 2048 sectors
    Super Offset : 8 sectors
           State : clean
     Device UUID : 703f0adc:b164366b:50653ead:7072d192

     Update Time : Sat May  4 02:48:00 2013
        Checksum : 30c86354 - correct
          Events : 0

          Layout : left-symmetric
      Chunk Size : 512K

    Device Role : spare
    Array State : .AA.A ('A' == active, '.' == missing)
/dev/sdd:
           Magic : a92b4efc
         Version : 1.2
     Feature Map : 0x0
      Array UUID : c008e97a:aacc7745:c8c49a31:08312d4e
            Name : CronosR:0  (local to host CronosR)
   Creation Time : Mon Nov 12 10:27:52 2012
      Raid Level : raid5
    Raid Devices : 5

  Avail Dev Size : 3907027120 (1863.02 GiB 2000.40 GB)
      Array Size : 15628103680 (7452.06 GiB 8001.59 GB)
   Used Dev Size : 3907025920 (1863.02 GiB 2000.40 GB)
     Data Offset : 2048 sectors
    Super Offset : 8 sectors
           State : clean
     Device UUID : 3995e1e8:3188f2a6:f2afb876:9e024359

     Update Time : Sat May  4 02:48:00 2013
        Checksum : 9537952 - correct
          Events : 836068

          Layout : left-symmetric
      Chunk Size : 512K

    Device Role : Active device 4
    Array State : .AA.A ('A' == active, '.' == missing)
/dev/sde:
           Magic : a92b4efc
         Version : 1.2
     Feature Map : 0x0
      Array UUID : c008e97a:aacc7745:c8c49a31:08312d4e
            Name : CronosR:0  (local to host CronosR)
   Creation Time : Mon Nov 12 10:27:52 2012
      Raid Level : raid5
    Raid Devices : 5

  Avail Dev Size : 3907027120 (1863.02 GiB 2000.40 GB)
      Array Size : 15628103680 (7452.06 GiB 8001.59 GB)
   Used Dev Size : 3907025920 (1863.02 GiB 2000.40 GB)
     Data Offset : 2048 sectors
    Super Offset : 8 sectors
           State : clean
     Device UUID : 27b14550:17bf2abe:14352207:93c20bd3

     Update Time : Sat May  4 02:48:00 2013
        Checksum : 9222113a - correct
          Events : 836068

          Layout : left-symmetric
      Chunk Size : 512K

    Device Role : Active device 2
    Array State : .AA.A ('A' == active, '.' == missing)
/dev/sdf:
           Magic : a92b4efc
         Version : 1.2
     Feature Map : 0x0
      Array UUID : c008e97a:aacc7745:c8c49a31:08312d4e
            Name : CronosR:0  (local to host CronosR)
   Creation Time : Mon Nov 12 10:27:52 2012
      Raid Level : raid5
    Raid Devices : 5

  Avail Dev Size : 3907027120 (1863.02 GiB 2000.40 GB)
      Array Size : 15628103680 (7452.06 GiB 8001.59 GB)
   Used Dev Size : 3907025920 (1863.02 GiB 2000.40 GB)
     Data Offset : 2048 sectors
    Super Offset : 8 sectors
           State : clean
     Device UUID : 4bc817b1:8ac80143:798031df:caba2a52

     Update Time : Sat May  4 02:48:00 2013
        Checksum : cef9756a - correct
          Events : 836068

          Layout : left-symmetric
      Chunk Size : 512K

    Device Role : Active device 1
    Array State : .AA.A ('A' == active, '.' == missing)

Is there any chance to recover the raid?
Has anybody any idea?
As I am just a poor student I didn't quiet have the money to do backups 
so my private data would be lost if the raid failed.

I would really appreciate your help!

Thank you all in advance,
Frederick Gnodtke


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Help with failed raid 5
  2013-05-04 16:51 Help with failed raid 5 Frederick Gnodtke
@ 2013-05-04 21:59 ` Drew
  2013-05-05 21:52 ` Phil Turmel
  1 sibling, 0 replies; 3+ messages in thread
From: Drew @ 2013-05-04 21:59 UTC (permalink / raw)
  To: Frederick Gnodtke; +Cc: linux-raid

The only thing that stands out is that your create statement doesn't
jive with your disk listing. "mdadm --create /dev/md0 --assume-clean
--raid-devices=5 --level=5 /dev/sd[abdef] excludes sdc during creation
whereas the disk listing you post later doesn't mention sda.



-- 
Drew

"Nothing in life is to be feared. It is only to be understood."
--Marie Curie

"This started out as a hobby and spun horribly out of control."
-Unknown

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Help with failed raid 5
  2013-05-04 16:51 Help with failed raid 5 Frederick Gnodtke
  2013-05-04 21:59 ` Drew
@ 2013-05-05 21:52 ` Phil Turmel
  1 sibling, 0 replies; 3+ messages in thread
From: Phil Turmel @ 2013-05-05 21:52 UTC (permalink / raw)
  To: Frederick Gnodtke; +Cc: linux-raid

Hi Frederick,

On 05/04/2013 12:51 PM, Frederick Gnodtke wrote:
> Hi,
> 
> I hope someone can help me with this as I am struggling with it since
> this morning.

We may be able to help you.  Some critical information is missing.  For
the record, running raid5 when you have a hot spare available to make it
a raid6 is pretty much insane.

> The following scenario: I have a softwareraid 5 created using mdadm.
> It consisted out of 5 Disks, each of them 2000GB with one as spare-drive.
> The original mdadm.conf looked like this:

[trim /]

> Everything was fine until this morning I tried to open a file and got an
> I/O-Error. I rebooted the computer, stopped the raid, did an smartctl -t
> long on all drives belonging to the raid, but they all seem to run quiet
> well.

> Reassembling it using mdadm --assemble --scan --force did not lead to
> anything so I tried to recreate it using "mdadm --create /dev/md0
> --assume-clean --raid-devices=5 --level=5 /dev/sd[abdef]

Really bad choice.  Advice to use "--create --assume-clean" is scattered
around the 'net, but there are terrible pitfalls.

> It created an array but there was no filesystem to mount.
> fsck could not detect an filesystem and I didn't relocate any "bad
> blocks" as I was afraid this might reduce my chance to repair the raid
> to zero.

The device order you specified is certainly wrong, based on your
original superblocks.

> The original Superblocks of all disks befor recreating the array looked
> like this (The raid already failed when I captured this):

[trim /]

You don't show the original superblock for /dev/sda.  We need it.

From the given superblocks, your order would be /dev/sd{?,f,e,?,d,?},
where the question marks would be various combinations of a, b, and c.

The roles of sdb and sdc show as spare, either of which could have been
the original spare.

Please look in your system's syslog to see if you can find the raid
assembly report from the last boot before the problem surfaced.  It
would be an alternate source of drive roles.

If you find that in syslog, there's a good chance you will also be able
to find the drive error reports in syslog for the kickout of your
drives.  Show us the excerpts.

> Is there any chance to recover the raid?

Yes.

> Has anybody any idea?

You may have to try multiple combinations of drive orders if it cannot
be figured out from other information.  You *must* not mount your
filesystem until we are certain the order is correct.

> As I am just a poor student I didn't quiet have the money to do backups
> so my private data would be lost if the raid failed.

Excuses don't really matter.  Either we can help you or we can't.
Everyone has limited funds, to some extent.  I recommend you prioritize
your personal data as to that which gets backed up and that which
doesn't.  Most people can't afford to *not* back up at least part of
their data, but don't know until they lose it.

> I would really appreciate your help!

When people report array problems for drives that all appear healthy,
certain suspicions arise.  Please also provide:

1) "smartctl -x" output for each drive.
2) "uname -a" output
3) "mdadm --version" output

Phil

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2013-05-05 21:52 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-05-04 16:51 Help with failed raid 5 Frederick Gnodtke
2013-05-04 21:59 ` Drew
2013-05-05 21:52 ` Phil Turmel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).